SlideShare a Scribd company logo
Kazuaki Ishizaki, IBM Research
Madhusudanan Kandasamy, IBM Systems
Transparent GPU Exploitation
on Apache Spark
#Res9SAIS
About Me – Madhusudanan Kandasamy
• STSM(Principal Engineer) at IBM Systems
• Working for IBM Power Systems over 15 years
– AIX & Linux OS Development
– Apache Spark Optimization for Power Systems
– Distributed ML/DL Framework with GPU & NVLink
• IBM Master Inventor (20+ Patents, 18 disclosure publications)
• Committer of GPUEnabler
– Apache Spark Plug-in to execute GPU code on Spark
• https://github.com/IBMSparkGPU/GPUEnabler
• Github: https://github.com/kmadhugit
• E-mail: madhusudanan@in.ibm.com
2Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
What You Will Learn from This Talk
• Why accelerate your workloads using GPU on Spark
– GPU/CUDA Overview
– Spark + GPU for ML workloads
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA
– Translate DataFrame program to GPU code automatically
• What are key factors to accelerate program
– Parallelism in a program
– Data format on memory
3Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
GPU/CUDA Overview
• GPGPU - Throughput
• CUDA, which is famous,
requires programmers
to explicitly write
operations for
– allocate/deallocate
device memories
– copying data between
CPU and GPU
– execute GPU kernel
4Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
// code for GPU
__global__ void GPUMultiplyBy2(
float* d_a, float* d_b, int n) {
int i = threadIdx.x;
if (n <= i) return;
d_b[i] = d_a[i] * 2.0;
}
void fooCUDA(N, float *A, float *B, int N) {
int sizeN = N * sizeof(float);
cudaMalloc(&d_A, sizeN); cudaMalloc(&d_B, sizeN);
cudaMemcpy(d_A, A, sizeN, HostToDevice);
GPUMultiplyBy2<<<N, 1>>>(d_A, d_B, N);
cudaMemcpy(B, d_B, sizeN, DeviceToHost);
cudaFree(d_B); cudaFree(d_A);
}
Spark + GPU for ML workloads
• Spark provides efficient ways to parallelize jobs across
cluster of nodes
• GPUs provide thousands of cores for efficient way to
parallelize job in a node.
• GPUs provide up to 100x processing over CPU *
• Combining Spark + GPU for lightning fast processing
– We will talk about two approaches
5Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
* https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
6Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Invoke Hand-tuned GPU program in CUDA
• GPUEnabler to simplify development
• Implemented as Spark package
– Can be drop-in into your version of Spark
• Easily Launch hand coded GPU kernels from map() or
reduce() parallel function in RDD, Dataset
• manages GPU memory, copy data between GPU and CPU,
and convert data format
• Available at https://github.com/IBMSparkGPU/GPUEnabler
7Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Example - hand tuned CUDA kernel in Spark
8Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
CUDA is a programming language
for GPU defined by NVIDIA
PTX is an assembly language file
that can be generated by a CUDA file
Step 1: Write CUDA kernels (without memory management and data copy)
Step 2: Write Spark program
Step 3: Compile and submit
Object SparkExample {
val mapFunction = new CUDAFunction("multiplyBy2", Seq(“value”), “example.ptx”)
val output = sc.parallelize(1 to 65536, 24).cache
.mapExtFunc(x => x*2, mapFunction).show }
__global__ void multiplyBy2(int *in, int *out, long size) {
long i = threadIdx.x + blockIdx.x * blockDim.x;
if (size <= i) return;
out[i] = in[i] * 2;
}
$ nvcc example.cu -ptx
$ mvn package
$ bin/spark-submit --class SparkExample SparkExample.jar
--packages com.ibm:gpu-enabler_2.11:1.0.0
Performance Improvements of GPU
program over Parallel CPU
• Achieve 3.6x for CUDA-based mini-batch logistic regression
using one P100 card over POWER8 160 SMT cores
9Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Relative execution time over GPU version
IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0,
IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Apache Spark 2.0.1, master=“local[160]”, GPU Enabler as 2017/5/1, N=112000,
features=8500, iterations=15, mini-batch size=10, parallelism(GPU)=8, parallelism(CPU)=320
Shorter is better for GPU
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
10Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
“Transparent” GPU Exploitation
• Enhanced Spark by modifying Spark source code
• Accept expression in select(), selectExpr(), and
reduce() in DataFrame
• Automatically generate CUDA code from DataFrame
program
• Automatically manage GPU memory and copy data
between GPU and CPU
• No data format conversion is required
11Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Example - “Transparent” GPU Exploitation
• Write Spark program in DataFrame
• Compile and submit them
12Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
$ mvn package
$ bin/spark-submit --class SparkExample SparkExample.jar
Object SparkExample {
val output = sc.parallelize(1 to 65536, 24).toDF(“value”).cache
.select($”value” * 2).cache.show }
Performance Improvements of Spark
DataFrame program over Parallel CPU
• Achieve 1.7x for Spark vector multiplication using one P100
card over POWER8 160 SMT cores
13Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Relative execution time over GPU version
IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0,
IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Based on Apache Spark master (id:657cb9), master=“local[160]”, N=480, vector
length=1600, parallelism(GPU)=8, parallelism(CPU)=320
Shorter is better for GPU
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
14Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
About Me – Kazuaki Ishizaki
• Researcher at IBM Research in compiler optimization
• Working for IBM Java virtual machine over 20 years
– In particular, just-in-time compiler
• Active Contributor of Spark since 2016
– 98 commits, #37 in the world (25 commits, #8 in 2018)
• Committer of GPUEnabler
• Homepage: http://ibm.biz/ishizaki
• Github: https://github.com/kiszk, Twitter: @kiszk
15Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Toward Faster GPU code
• Assign a lot of parallel computations into GPU cores
• Reduce # of memory transactions to GPU memory
16Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Assign A Lot of Parallel Computations into
GPU Cores
• Achieve high utilization of GPU
17Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Achieve high performanceAchieve low performance
Busy idle Busy
Reduce # of Memory Transactions
• Depends on memory layout, # of memory transactions are
different in a program
18Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
x1 y1 2 61 5x3 y3
Assumption: 4 consecutive data elements
can be coalesced by GPU hardware
4 v.s. 2
memory transactions to
GPU device memory
Row-oriented layout Column-oriented layout
Pt(x: Int, y: Int)
Load Pt.x1
Load Pt.x2
Load Pt.x3
Load Pt.x4
Load Pt.y1
Load Pt.y2
Load Pt.y3
Load Pt.y4
x2 y2 x4 y4 43 87
x1 x2 x3 x4
GPU
cores
Load Pt.x Load Pt.yLoad Pt.x Load Pt.y
1 231 2 4
y1 y2 y3 y4x1 x2 x3 x4 y1 y2 y3 y4
Memory
access
GPU
cores
Memory
transaction
Memory
access
Memory
transaction
Memory
access
Toward Faster GPU Code
• Assign a lot of parallel computations into GPU cores
– Spark program has been already written by a set of parallel
operations
• e.g. map, join, …
• Reduce # of memory transactions
– Column-oriented layout achieves better performance
• The paper* reports 3.3x performance improvement of GPU kernel
execution of kmeans over row-oriented layout
19Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
* Che, Shuai and Sheaffer, Jeremy W. and Skadron, Kevin.
“Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems”, SC’11
Questions
• How can we write a parallel program for GPU on Spark?
• How can we use column-oriented storage for GPU in
Spark?
20Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Questions
• How can we write a parallel program for GPU on Spark?
– Thanks to Spark programming model!!
• How can we use column-oriented storage for GPU in
Spark?
21Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Outline
• Why accelerate your workloads using GPU on Spark
• How to program GPUs in Apache Spark
– Invoke hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• Toward faster GPU code
• How two frameworks work?
22Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Data Movement with GPUEnabler
• Data in RDD is moved into off-heap as column-oriented
storage
23Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy
map
Data copy
Format
conversion
Format
conversion
RDD1 RDD3val rdd1.cache
val rdd2=rdd1.mapExtFunc(…)
val rdd3=rdd2.mapExtFunc(…)
Row-
oriented
storage
Column-
oriented
storage
Row-
oriented
storage
map
Row-
oriented
storage
Row-
oriented
storage
Column-
oriented
storage
Data Movement with GPUEnabler
• Data in RDD is moved into off-heap as column-oriented
storage
24Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy
map
Data copy
Format
conversion
Format
conversion
RDD1 RDD3val rdd1.cache
val rdd2=rdd1.mapExtFunc(…)
val rdd3=rdd2.mapExtFunc(…)
Row-
oriented
storage
Column-
oriented
storage
Column-
oriented
storage
map
Column-
oriented
storage
Row-
oriented
storage
Column-
oriented
storage
Conversions and copies
are automatically done
How To Write GPU Code in GPUEnabler
• Write a GPU kernel corresponds to map()
25Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
val rdd2 = rdd1
.mapExtFunc(p => Point(p.x*2, p.y*2), mapFunction)
__global__ void multiplyBy2(int *inx, int *iny,
int *outx, int *outy, long size) {
long i = threadIdx.x + blockIdx.x * blockDim.x;
if (size <= i) return;
outx[i] = inx[i] * 2; outy[i] = iny[i] * 2;
}
GPU Kernel
Spark Program
Data Movement with DataFrame
• Cache in DataFrame already uses column-oriented storage
• We enhanced to create cache for DataFrame in off-heap
– Reduce conversion and data copy between Java heap and Off-heap
26Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy Data copy
DataFrame’s cache DataFrame’s cache
val df1.cache
val df2=df1.map(…)
val df3=df2.map(…).cache
Column-
oriented
storage
Column-
oriented
storage
map + map
Column-
oriented
storage
Column-
oriented
storage
Data Movement with DataFrame
• Cache in DataFrame already uses column-oriented storage
• We enhanced to create cache for DataFrame in off-heap
– Reduce conversion and data copy between Java heap and Off-heap
27Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Java Heap
GPU
Off-heap
CPU
Data copy Data copy
DataFrame’s cache DataFrame’s cache
val df1.cache
val df2=df1.map(…)
val df3=df2.map(…).cache
Column-
oriented
storage
Column-
oriented
storage
map + map
Column-
oriented
storage
Column-
oriented
storage
copy is
automatically done
How To Generate GPU Code from
DataFrame
• Added a new path to generate CUDA code into Catalyst
28Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Derived Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust
DataFrame
program
Unresolved
logical plan
Logical
Plan
Optimized
Logical Plan
Physical
Plan
Java
CUDA
for kernel
Analysis Optimizations Planning
Code
generation
Newly
added
Catalyst
Takeaway
• Why accelerate your workloads using GPU on Spark
– Achieved up to 3.6x over 160-CPU-thread parallel execution
• How to use GPUs on Spark
– Invoke Hand-tuned GPU program in CUDA (GPUEnabler)
– Translate DataFrame program to GPU code automatically
• How two approaches execute program on GPUs
– Address easy programming for many non-experts, not the
state-of-the-art performance by small numbers of top-notch
programmers
29Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
Ad

More Related Content

What's hot (20)

Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
Adarsh Pannu
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
Jen Aman
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
Jen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
Databricks
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingStructured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim Dowling
Databricks
 
High Performance Python on Apache Spark
High Performance Python on Apache SparkHigh Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
Evan Chan
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Daniel Rodriguez
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Databricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 

Similar to Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhusudanan Kandasamy (20)

Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Databricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
Databricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
VICTOR MAESTRE RAMIREZ
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PAPIs.io
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
Adam Roberts
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Spark Summit
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Ahsan Javed Awan
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Databricks
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
NVIDIA
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Databricks
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
Databricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Spark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud IbrahimovSpark performance tuning - Maksud Ibrahimov
Spark performance tuning - Maksud Ibrahimov
Maksud Ibrahimov
 
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
NASA Advanced Supercomputing (NAS) Division - Programming and Building HPC Ap...
VICTOR MAESTRE RAMIREZ
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
Kazuaki Ishizaki
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
VMware Tanzu
 
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
Distributed deep learning with spark on AWS - Vincent Van Steenbergen @ PAPIs...
PAPIs.io
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
Apache Spark Performance Observations
Apache Spark Performance ObservationsApache Spark Performance Observations
Apache Spark Performance Observations
Adam Roberts
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Spark Summit
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Ahsan Javed Awan
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU ClustersScalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
Databricks
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
NVIDIA
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 

Transparent GPU Exploitation on Apache Spark with Kazuaki Ishizaki and Madhusudanan Kandasamy

  • 1. Kazuaki Ishizaki, IBM Research Madhusudanan Kandasamy, IBM Systems Transparent GPU Exploitation on Apache Spark #Res9SAIS
  • 2. About Me – Madhusudanan Kandasamy • STSM(Principal Engineer) at IBM Systems • Working for IBM Power Systems over 15 years – AIX & Linux OS Development – Apache Spark Optimization for Power Systems – Distributed ML/DL Framework with GPU & NVLink • IBM Master Inventor (20+ Patents, 18 disclosure publications) • Committer of GPUEnabler – Apache Spark Plug-in to execute GPU code on Spark • https://github.com/IBMSparkGPU/GPUEnabler • Github: https://github.com/kmadhugit • E-mail: [email protected] 2Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 3. What You Will Learn from This Talk • Why accelerate your workloads using GPU on Spark – GPU/CUDA Overview – Spark + GPU for ML workloads • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA – Translate DataFrame program to GPU code automatically • What are key factors to accelerate program – Parallelism in a program – Data format on memory 3Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 4. GPU/CUDA Overview • GPGPU - Throughput • CUDA, which is famous, requires programmers to explicitly write operations for – allocate/deallocate device memories – copying data between CPU and GPU – execute GPU kernel 4Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS // code for GPU __global__ void GPUMultiplyBy2( float* d_a, float* d_b, int n) { int i = threadIdx.x; if (n <= i) return; d_b[i] = d_a[i] * 2.0; } void fooCUDA(N, float *A, float *B, int N) { int sizeN = N * sizeof(float); cudaMalloc(&d_A, sizeN); cudaMalloc(&d_B, sizeN); cudaMemcpy(d_A, A, sizeN, HostToDevice); GPUMultiplyBy2<<<N, 1>>>(d_A, d_B, N); cudaMemcpy(B, d_B, sizeN, DeviceToHost); cudaFree(d_B); cudaFree(d_A); }
  • 5. Spark + GPU for ML workloads • Spark provides efficient ways to parallelize jobs across cluster of nodes • GPUs provide thousands of cores for efficient way to parallelize job in a node. • GPUs provide up to 100x processing over CPU * • Combining Spark + GPU for lightning fast processing – We will talk about two approaches 5Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS * https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
  • 6. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 6Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 7. Invoke Hand-tuned GPU program in CUDA • GPUEnabler to simplify development • Implemented as Spark package – Can be drop-in into your version of Spark • Easily Launch hand coded GPU kernels from map() or reduce() parallel function in RDD, Dataset • manages GPU memory, copy data between GPU and CPU, and convert data format • Available at https://github.com/IBMSparkGPU/GPUEnabler 7Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 8. Example - hand tuned CUDA kernel in Spark 8Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS CUDA is a programming language for GPU defined by NVIDIA PTX is an assembly language file that can be generated by a CUDA file Step 1: Write CUDA kernels (without memory management and data copy) Step 2: Write Spark program Step 3: Compile and submit Object SparkExample { val mapFunction = new CUDAFunction("multiplyBy2", Seq(“value”), “example.ptx”) val output = sc.parallelize(1 to 65536, 24).cache .mapExtFunc(x => x*2, mapFunction).show } __global__ void multiplyBy2(int *in, int *out, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; if (size <= i) return; out[i] = in[i] * 2; } $ nvcc example.cu -ptx $ mvn package $ bin/spark-submit --class SparkExample SparkExample.jar --packages com.ibm:gpu-enabler_2.11:1.0.0
  • 9. Performance Improvements of GPU program over Parallel CPU • Achieve 3.6x for CUDA-based mini-batch logistic regression using one P100 card over POWER8 160 SMT cores 9Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Relative execution time over GPU version IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0, IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Apache Spark 2.0.1, master=“local[160]”, GPU Enabler as 2017/5/1, N=112000, features=8500, iterations=15, mini-batch size=10, parallelism(GPU)=8, parallelism(CPU)=320 Shorter is better for GPU
  • 10. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 10Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 11. “Transparent” GPU Exploitation • Enhanced Spark by modifying Spark source code • Accept expression in select(), selectExpr(), and reduce() in DataFrame • Automatically generate CUDA code from DataFrame program • Automatically manage GPU memory and copy data between GPU and CPU • No data format conversion is required 11Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 12. Example - “Transparent” GPU Exploitation • Write Spark program in DataFrame • Compile and submit them 12Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS $ mvn package $ bin/spark-submit --class SparkExample SparkExample.jar Object SparkExample { val output = sc.parallelize(1 to 65536, 24).toDF(“value”).cache .select($”value” * 2).cache.show }
  • 13. Performance Improvements of Spark DataFrame program over Parallel CPU • Achieve 1.7x for Spark vector multiplication using one P100 card over POWER8 160 SMT cores 13Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Relative execution time over GPU version IBM Power System S822LC for High Performance Computing “Minsky”, at 4 GHz with 512GB memory, one P100 card, Fedora 7.3, CUDA 8.0, IBM Java pxl6480sr4fp2-20170322_01(SR4 FP2), 128GB heap, Based on Apache Spark master (id:657cb9), master=“local[160]”, N=480, vector length=1600, parallelism(GPU)=8, parallelism(CPU)=320 Shorter is better for GPU
  • 14. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 14Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 15. About Me – Kazuaki Ishizaki • Researcher at IBM Research in compiler optimization • Working for IBM Java virtual machine over 20 years – In particular, just-in-time compiler • Active Contributor of Spark since 2016 – 98 commits, #37 in the world (25 commits, #8 in 2018) • Committer of GPUEnabler • Homepage: http://ibm.biz/ishizaki • Github: https://github.com/kiszk, Twitter: @kiszk 15Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 16. Toward Faster GPU code • Assign a lot of parallel computations into GPU cores • Reduce # of memory transactions to GPU memory 16Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 17. Assign A Lot of Parallel Computations into GPU Cores • Achieve high utilization of GPU 17Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Achieve high performanceAchieve low performance Busy idle Busy
  • 18. Reduce # of Memory Transactions • Depends on memory layout, # of memory transactions are different in a program 18Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS x1 y1 2 61 5x3 y3 Assumption: 4 consecutive data elements can be coalesced by GPU hardware 4 v.s. 2 memory transactions to GPU device memory Row-oriented layout Column-oriented layout Pt(x: Int, y: Int) Load Pt.x1 Load Pt.x2 Load Pt.x3 Load Pt.x4 Load Pt.y1 Load Pt.y2 Load Pt.y3 Load Pt.y4 x2 y2 x4 y4 43 87 x1 x2 x3 x4 GPU cores Load Pt.x Load Pt.yLoad Pt.x Load Pt.y 1 231 2 4 y1 y2 y3 y4x1 x2 x3 x4 y1 y2 y3 y4 Memory access GPU cores Memory transaction Memory access Memory transaction Memory access
  • 19. Toward Faster GPU Code • Assign a lot of parallel computations into GPU cores – Spark program has been already written by a set of parallel operations • e.g. map, join, … • Reduce # of memory transactions – Column-oriented layout achieves better performance • The paper* reports 3.3x performance improvement of GPU kernel execution of kmeans over row-oriented layout 19Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS * Che, Shuai and Sheaffer, Jeremy W. and Skadron, Kevin. “Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems”, SC’11
  • 20. Questions • How can we write a parallel program for GPU on Spark? • How can we use column-oriented storage for GPU in Spark? 20Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 21. Questions • How can we write a parallel program for GPU on Spark? – Thanks to Spark programming model!! • How can we use column-oriented storage for GPU in Spark? 21Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 22. Outline • Why accelerate your workloads using GPU on Spark • How to program GPUs in Apache Spark – Invoke hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • Toward faster GPU code • How two frameworks work? 22Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS
  • 23. Data Movement with GPUEnabler • Data in RDD is moved into off-heap as column-oriented storage 23Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy map Data copy Format conversion Format conversion RDD1 RDD3val rdd1.cache val rdd2=rdd1.mapExtFunc(…) val rdd3=rdd2.mapExtFunc(…) Row- oriented storage Column- oriented storage Row- oriented storage map Row- oriented storage Row- oriented storage Column- oriented storage
  • 24. Data Movement with GPUEnabler • Data in RDD is moved into off-heap as column-oriented storage 24Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy map Data copy Format conversion Format conversion RDD1 RDD3val rdd1.cache val rdd2=rdd1.mapExtFunc(…) val rdd3=rdd2.mapExtFunc(…) Row- oriented storage Column- oriented storage Column- oriented storage map Column- oriented storage Row- oriented storage Column- oriented storage Conversions and copies are automatically done
  • 25. How To Write GPU Code in GPUEnabler • Write a GPU kernel corresponds to map() 25Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS val rdd2 = rdd1 .mapExtFunc(p => Point(p.x*2, p.y*2), mapFunction) __global__ void multiplyBy2(int *inx, int *iny, int *outx, int *outy, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; if (size <= i) return; outx[i] = inx[i] * 2; outy[i] = iny[i] * 2; } GPU Kernel Spark Program
  • 26. Data Movement with DataFrame • Cache in DataFrame already uses column-oriented storage • We enhanced to create cache for DataFrame in off-heap – Reduce conversion and data copy between Java heap and Off-heap 26Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy Data copy DataFrame’s cache DataFrame’s cache val df1.cache val df2=df1.map(…) val df3=df2.map(…).cache Column- oriented storage Column- oriented storage map + map Column- oriented storage Column- oriented storage
  • 27. Data Movement with DataFrame • Cache in DataFrame already uses column-oriented storage • We enhanced to create cache for DataFrame in off-heap – Reduce conversion and data copy between Java heap and Off-heap 27Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Java Heap GPU Off-heap CPU Data copy Data copy DataFrame’s cache DataFrame’s cache val df1.cache val df2=df1.map(…) val df3=df2.map(…).cache Column- oriented storage Column- oriented storage map + map Column- oriented storage Column- oriented storage copy is automatically done
  • 28. How To Generate GPU Code from DataFrame • Added a new path to generate CUDA code into Catalyst 28Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS Derived Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust DataFrame program Unresolved logical plan Logical Plan Optimized Logical Plan Physical Plan Java CUDA for kernel Analysis Optimizations Planning Code generation Newly added Catalyst
  • 29. Takeaway • Why accelerate your workloads using GPU on Spark – Achieved up to 3.6x over 160-CPU-thread parallel execution • How to use GPUs on Spark – Invoke Hand-tuned GPU program in CUDA (GPUEnabler) – Translate DataFrame program to GPU code automatically • How two approaches execute program on GPUs – Address easy programming for many non-experts, not the state-of-the-art performance by small numbers of top-notch programmers 29Transparent GPU Exploitation on Apache Spark, Ishizaki & Kandasamy, #Res9SAIS