SlideShare a Scribd company logo
Yuval Degani, LinkedIn
Dr. Jithin Jose, Microsoft Azure
Tackling Network
Bottlenecks with
Hardware Accelerations:
Cloud vs. On-Premise
#UnifiedAnalytics #SparkAISummit
Intro
• Infinite loop of removing performance road blocks
• With faster storage devices (DRAM, NVMe, SSD) and
stronger than ever processing power (CPU, GPU, ASIC),
a traditional network just can’t keep up with I/O flow
• Upgrading to higher wire speeds will rarely do the trick
• This is where co-designed hardware acceleration can be
used to truly utilize the power of a compute cluster
2#UnifiedAnalytics #SparkAISummit
Previous talks
3#UnifiedAnalytics #SparkAISummit
Spark Summit Europe 2017
First open-source stand-alone RDMA accelerated
shuffle plugin for Spark (SparkRDMA)
Spark+AI Summit North America 2018
First preview of SparkRDMA on Azure HPC
nodes, demonstrating x2.6 job speed-up on cloud
VMs
Network Bottlenecks in the Wild
4#UnifiedAnalytics #SparkAISummit
Network Bottlenecks in the Wild
• Not always caused by lack of bandwidth
• Network I/O imposes overhead in many system components:
– Memory management
– Memory copy
– Garbage Collection
– Serialization/Compression/Encryption
• Overhead=CPU cycles, cycles that are not available for the
actual job at hand
• Hardware acceleration can reduce overhead and allow better
utilization of compute and network resources
5#UnifiedAnalytics #SparkAISummit
Network Bottlenecks: Shuffle
• Most expensive non-storage
network I/O in compute clusters
• Blocking, massive movement of
transient data
• Acceleration opportunities:
– Efficient serving with reduced server-
side logic
– Serialization/Compression/Encryption
– Reduce I/O overhead and latency by
employing modern transport protocols
6#UnifiedAnalytics #SparkAISummit
Partitioning
4%
Input
11%
Shuffle
Read
57%
Output
28%
HiBench TeraSort on Spark
Network Bottlenecks: Distributed
Training
• Model updates create massive
network traffic
• Model update frequency rises
as GPUs get faster
• Acceleration opportunities:
– Inter-GPU RDMA communication
– Lower latency network transport
– Collectives offloads
7#UnifiedAnalytics #SparkAISummit
K80
M60
V100
ResNet 269*
Total Time GPU Active Time
* “Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training” by Luo et al.
Network Bottlenecks: Storage
• Massive data movement
• Premium devices (DRAM, Flash) provide storage
access speeds that were never seen before
• Acceleration opportunities:
– Higher bandwidth
– Reduced transport overhead
– OS/CPU bypass – direct storage access from network
devices
8#UnifiedAnalytics #SparkAISummit
Major Hardware Acceleration
Technologies
9#UnifiedAnalytics #SparkAISummit
Speeds
• 1, 10, 25, 40, 100, 200Gbps
• Faster network doesn’t
necessarily mean a faster
runtime
• Many workloads consist of
relatively short bursts rather
than sustainable throughput:
higher bandwidth may not have
any effect
10#UnifiedAnalytics #SparkAISummit
0
100
200
300
400
500
600
700
800
Flink
TeraSort
Flink
PageRank
PowerGraph
PageRank
Timely
PageRank
Effect of network speed
on workload runtime*
1GbE 10GbE 40GbE
* “On The [Ir]relevance of Network Performance for Data Processing” by Trivedi et al.
InfiniBand
• De-facto standard in the HPC world
• FDR: 56Gbps, EDR: 100Gbps, HDR:
200Gbps
• Sub-microsecond latency
• Native support for RDMA
• HW accelerated transport layer
• True SDN: standard fabric components are
developed as open-source and are cross-
platform
• Native support for Switch collectives offload
11#UnifiedAnalytics #SparkAISummit
Ethernet
23%
InfiniBand
38%
Custom
28%
Omnipath
10%
Proprietary
1%
TOP500 Supercomputers
Interconnect Performance
Share*
* www.top500.org
RDMA
• Remote Direct Memory Access
– Read/write from/to remote memory locations
• Zero-copy
• Direct hardware interface – bypasses the
kernel and TCP/IP in IO path
• Flow control and reliability is offloaded in
hardware
• Supported on almost all mid-range/high-
end network adapters: both InfiniBand
and Ethernet
12
Java app
buffer
OS
Sockets
TCP/IP
Driver
Network Adapter
RDMA
Socket
Context switch
#UnifiedAnalytics #SparkAISummit
NVIDIA GPUDirect
• Direct DMA over PCIe
• RDMA devices can write/read
directly to/from GPU memory
over the network
• No CPU overhead
• Zero-copy
13#UnifiedAnalytics #SparkAISummit
GPUDirect
Non-GPUDirect
NIC GPU
CPU
“Smart NIC” – FPGA/ASIC Offloads
• FPGA – tailor-made accelerations
• ASIC – less flexibility, better performance
• Common use cases:
– I/O: Serialization, compression, encryption offloads
– Data: Aggregation, sorting, group-by, reduce
• Deployment options:
– Pipeline
– Look-aside
– Bump-on-the-wire
14#UnifiedAnalytics #SparkAISummit
“Smart Switch”
• In-network processing
– Data reduction during movement
– Wire-speed
• Generic: MPI Switch Collectives Offloads (e.g.
Mellanox SHArP)
• Per-workload: Programmable switches (e.g.
Barefoot Tofino)
– Example: Network-Accelerated Query Processing
15#UnifiedAnalytics #SparkAISummit
NVMeOF
• Network protocol for NVM
express disks (PCIe)
• Uses RDMA to provide direct
NIC<->Disk access
• Completely bypasses the host
• Minimal latency differences
between local and remote access
16#UnifiedAnalytics #SparkAISummit
NVMeOF
Traditional
NIC
CPU
Azure Network Acceleration
Offering
17#UnifiedAnalytics #SparkAISummit
Offer ‘Bare Metal’ Experience
– Azure HPC Solution
#UnifiedAnalytics #SparkAISummit 18
Eliminate Jitter
Host holdback is a start, but must
completely isolate guest from host
Minroot & CPU Groups; separated
host and guest VM sandboxes
Full Network Experience
Enable customers to use Mellanox or
OFED drivers
Supports all MPI types and versions
Leverage hardware offload to
Mellanox InfiniBand ASIC
Transparent Exposure of
Hardware
Core N in guest VM should =
Core N in silicon
1:1 between physical pNUMA
topology and vNUMA topology
Latest Azure HPC Offerings – HB/HC
HB Series (AMD EPYC) HC Series (Intel Xeon Platinum)
Workloads Targets Bandwidth Intensive Compute Intensive
Core Count 60 44
System Memory 240 GB 352 GB
Network 100 Gbps EDR InfiniBand, 40 Gbps Ethernet
Storage Support Standard / Premium Azure Storage, and 700GB Local SSD
OS Support for RDMA CentOS/RHEL, Ubuntu, SLES 12, Windows
MPI Support
OpenMPI, HPC-X, MVAPICH2, MPICH,
Intel MPI, PlatformMPI, Microsoft MPI
Hardware Collectives Enabled
Access Model
Azure CLI, ARM template, Azure CycleCloud,
Azure Batch, Partner Platform
19#UnifiedAnalytics #SparkAISummit
Other Azure HPC Highlights
• SR-IOV going broad
– All HPC SKUs will support SR-IOV
– Driver/SKU Performance Optimizations
• GPUs
– Latest NDv2 Series
• 8 Nvidia Tesla v100 NVLINK interconnected GPUs
• Intel Skylake, 672 GB Memory
• Excellent platform for HPC and AI workloads
• Azure FPGA
– Based on Project Brainwave
– Deploy model to Azure FPGA, Reconfigure for different models
– Supports ResNet 50, ResNet 152, DenseNet-121, and VGG-16
20#UnifiedAnalytics #SparkAISummit
Accelerate Your Framework
21#UnifiedAnalytics #SparkAISummit
MPI Microbenchmarks
22#UnifiedAnalytics #SparkAISummit
• Experiments on HC cluster
• OSU Benchmarks 5.6.1
• OpenMPI (4.0.0) + UCX (1.5.0)
• MPI ranks pinned nearer to HCA
1.77 us
12 GB/s
• MPI Latency (4 B) – 1.77us
• Getting even better later this year
• MPI Bandwidth (4 MB) – 12.06 GB/s
0
2000
4000
6000
8000
10000
12000
14000
1
2
4
8
16
32
64
128
256
512
1K
2K
4K
8K
16K
32K
64K
128K
256K
512K
1M
2M
4M
Bandwidth(MB/s)
Message Size (bytes)
MPI Bandwidth
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
0
10
20
30
40
50
60
70
80
90
0 1 2 4 8 16 32 64 128 256 512 1K 2K
Time(us)
Message Size (bytes)
MPI Latency
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
SparkRDMA
• RDMA-powered ShuffleManager
plugin for Apache Spark
• Similarly spec 8 node cluster:
– On-prem: 100GbE RoCE
– Cloud: Azure ”h16mr” instances with
56Gbps InfiniBand
• https://github.com/Mellanox/SparkRDMA
23#UnifiedAnalytics #SparkAISummit
0 1000 2000
TeraSort 320GB
PageRank 19GB
On-prem non-RDMA 100GbE
On-prem RDMA 100GbE
Azure IPoIB 56Gbps
Azure RDMA 56Gbps
SparkRDMA on Azure
• Azure HC cluster:
– 100 Gbps InfiniBand
– 16 Spark Workers/HDFS DataNodes
– Separate NameNode
– Data folder hosted on SSD
– HiBench Benchmarks (gigantic)
• Spark 2.4.0, Hadoop 2.7.7, SparkRDMA 3.1
24#UnifiedAnalytics #SparkAISummit
0 100 200 300 400 500 600
TeraSort - 320 GB
PageRank - 19GB
Execution Time (s)
RDMA (100 Gbps)
IPoIB (100 Gbps)
HDFS-RDMA on Azure
25#UnifiedAnalytics #SparkAISummit
• OSU HDFS RDMA 0.9.1
• Based on Hadoop 3.0.0
• http://hibd.cse.ohio-state.edu/#hadoop3
• HDFS on HC cluster
• 1 NameNode
• 16 DataNodes
• Data folder hosted on SSD
• Packet Size: 128KB
• Containers per Node: 32 0
50
100
150
200
250
300
350
400
512GB 640GB 768GB 896GB 1TB
Time(sec)
Size (bytes)
TestDFSIO (Write) Execution Time
Ethernet (40 Gbps)
IPoIB (100 Gbps)
RDMA (100 Gbps)
Memcached-RDMA on Azure
26#UnifiedAnalytics #SparkAISummit
• OSU Memcached RDMA 0.9.6
• Based on Memcached 1.5.3 and
libmemcached 1.0.18
• http://hibd.cse.ohio-state.edu/#memcached
• Experiment run on HC Nodes
• Memcached GET (8 B) Latency – 5.5us
• Memcached SET (8 B) Latency – 6.45us
0
20
40
60
80
100
120
140
160
180
1 2 4 8 16 32 64 128 256 512 1K 2K 4K
Latency(us)
Message Size (bytes)
Memcached GET
0
20
40
60
80
100
120
140
160
180
1 2 4 8 16 32 64 128 256 512 1K 2K 4K
Latency(us)
Message Size (bytes)
Memcached SET
Ethernet (40 Gbps) IPoIB (100 Gbps)
RDMA (100 Gbps)
Kafka-RDMA on Azure
27#UnifiedAnalytics #SparkAISummit
• OSU Kafka RDMA 0.9.1
• Based on Apache Kafka 1.0.0
• http://hibd.cse.ohio-state.edu/#kafka
• HC cluster
• Broker with 100 GB Ramdisk
• Record Size – 100 bytes
• Number of Records – 500000
0
50
100
150
200
250
300
350
400
Producer
Time(s)
Kafka Producer Latency
IPoIB (100 Gbps) RDMA (100 Gbps)
0
10
20
30
40
50
60
70
Producer
Bandwidth(MB/s)
Kafka Producer Bandwidth
IPoIB (100 Gbps) RDMA (100 Gbps)
Horovod on Azure
28#UnifiedAnalytics #SparkAISummit
• Tensorflow 1.13
– ResNet-50 Training
– Partial ImageNet Data
– Batch Size = 64 per worker
– 2 workers per node
– Total batches 100
– CPU only version
• HC Cluster
– OpenMPI 4.0 + UCX 1.5
– Singularity container
• ~97% Scaling efficiency
100.00
96.78
95.58 94.93
100.00
98.86 98.37
96.94
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
0
200
400
600
800
1000
1200
1400
1600
2 4 8 16
%Efficiency
Images/second
# nodes
IPoIB (100 Gbps)
RDMA (100 Gbps)
IPoIB Efficiency
RDMA Efficiency
Wrapping up
29#UnifiedAnalytics #SparkAISummit
What’s available on major clouds?
Technology Azure AWS GCP
Network speeds 100Gbps 100Gbps 20Gbps?
InfiniBand ✔ ! !
RDMA ✔ (limited) !
GPUDirect ! (single host) !
Smart NIC ! ! !
Smart Switch ! ! !
NVMeOF ! ! !
30#UnifiedAnalytics #SparkAISummit
Take-aways
• Accelerated Frameworks:
– SparkRDMA on GitHub
– High Performance Big Data (From OSU)
– Horovod
• Azure instances
– Azure HPC HB/HC
– Azure NDv2 GPUs
– Azure FPGA
31#UnifiedAnalytics #SparkAISummit
Questions?
32#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT
Ad

More Related Content

What's hot (20)

Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Ceph Community
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
Colleen Corrice
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)
Vsevolod Shabad
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
Ceph Community
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open liberty
Andy Mauer
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginners
hpcexperiment
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
Ohyama Masanori
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Rongze Zhu
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
inside-BigData.com
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
Ceph Community
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red_Hat_Storage
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
Danielle Womboldt
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
Kyle Bader
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
Alex Moundalexis
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Ceph Community
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
Colleen Corrice
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)
Vsevolod Shabad
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
Ceph Community
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open liberty
Andy Mauer
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginners
hpcexperiment
 
Multi Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on KubernetesMulti Master PostgreSQL Cluster on Kubernetes
Multi Master PostgreSQL Cluster on Kubernetes
Ohyama Masanori
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Rongze Zhu
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
inside-BigData.com
 
Ceph Day Melabourne - Community Update
Ceph Day Melabourne - Community UpdateCeph Day Melabourne - Community Update
Ceph Day Melabourne - Community Update
Ceph Community
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red_Hat_Storage
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
Danielle Womboldt
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
Kyle Bader
 
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedInHadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Allen Wittenauer
 

Similar to Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise (20)

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Ontico
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
uCluster
uClusteruCluster
uCluster
Christos Kotsalos
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Shuquan Huang
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
OpenStack Korea Community
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
Deepak Shankar
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
Yutaka Kawai
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PC Cluster Consortium
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Ontico
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Databricks
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Alluxio, Inc.
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Shuquan Huang
 
Elastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent MemoryElastify Cloud-Native Spark Application with Persistent Memory
Elastify Cloud-Native Spark Application with Persistent Memory
Databricks
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
Alan Sill
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
OpenStack Korea Community
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
Deepak Shankar
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
Yutaka Kawai
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PC Cluster Consortium
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 

Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise

  • 1. Yuval Degani, LinkedIn Dr. Jithin Jose, Microsoft Azure Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise #UnifiedAnalytics #SparkAISummit
  • 2. Intro • Infinite loop of removing performance road blocks • With faster storage devices (DRAM, NVMe, SSD) and stronger than ever processing power (CPU, GPU, ASIC), a traditional network just can’t keep up with I/O flow • Upgrading to higher wire speeds will rarely do the trick • This is where co-designed hardware acceleration can be used to truly utilize the power of a compute cluster 2#UnifiedAnalytics #SparkAISummit
  • 3. Previous talks 3#UnifiedAnalytics #SparkAISummit Spark Summit Europe 2017 First open-source stand-alone RDMA accelerated shuffle plugin for Spark (SparkRDMA) Spark+AI Summit North America 2018 First preview of SparkRDMA on Azure HPC nodes, demonstrating x2.6 job speed-up on cloud VMs
  • 4. Network Bottlenecks in the Wild 4#UnifiedAnalytics #SparkAISummit
  • 5. Network Bottlenecks in the Wild • Not always caused by lack of bandwidth • Network I/O imposes overhead in many system components: – Memory management – Memory copy – Garbage Collection – Serialization/Compression/Encryption • Overhead=CPU cycles, cycles that are not available for the actual job at hand • Hardware acceleration can reduce overhead and allow better utilization of compute and network resources 5#UnifiedAnalytics #SparkAISummit
  • 6. Network Bottlenecks: Shuffle • Most expensive non-storage network I/O in compute clusters • Blocking, massive movement of transient data • Acceleration opportunities: – Efficient serving with reduced server- side logic – Serialization/Compression/Encryption – Reduce I/O overhead and latency by employing modern transport protocols 6#UnifiedAnalytics #SparkAISummit Partitioning 4% Input 11% Shuffle Read 57% Output 28% HiBench TeraSort on Spark
  • 7. Network Bottlenecks: Distributed Training • Model updates create massive network traffic • Model update frequency rises as GPUs get faster • Acceleration opportunities: – Inter-GPU RDMA communication – Lower latency network transport – Collectives offloads 7#UnifiedAnalytics #SparkAISummit K80 M60 V100 ResNet 269* Total Time GPU Active Time * “Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training” by Luo et al.
  • 8. Network Bottlenecks: Storage • Massive data movement • Premium devices (DRAM, Flash) provide storage access speeds that were never seen before • Acceleration opportunities: – Higher bandwidth – Reduced transport overhead – OS/CPU bypass – direct storage access from network devices 8#UnifiedAnalytics #SparkAISummit
  • 10. Speeds • 1, 10, 25, 40, 100, 200Gbps • Faster network doesn’t necessarily mean a faster runtime • Many workloads consist of relatively short bursts rather than sustainable throughput: higher bandwidth may not have any effect 10#UnifiedAnalytics #SparkAISummit 0 100 200 300 400 500 600 700 800 Flink TeraSort Flink PageRank PowerGraph PageRank Timely PageRank Effect of network speed on workload runtime* 1GbE 10GbE 40GbE * “On The [Ir]relevance of Network Performance for Data Processing” by Trivedi et al.
  • 11. InfiniBand • De-facto standard in the HPC world • FDR: 56Gbps, EDR: 100Gbps, HDR: 200Gbps • Sub-microsecond latency • Native support for RDMA • HW accelerated transport layer • True SDN: standard fabric components are developed as open-source and are cross- platform • Native support for Switch collectives offload 11#UnifiedAnalytics #SparkAISummit Ethernet 23% InfiniBand 38% Custom 28% Omnipath 10% Proprietary 1% TOP500 Supercomputers Interconnect Performance Share* * www.top500.org
  • 12. RDMA • Remote Direct Memory Access – Read/write from/to remote memory locations • Zero-copy • Direct hardware interface – bypasses the kernel and TCP/IP in IO path • Flow control and reliability is offloaded in hardware • Supported on almost all mid-range/high- end network adapters: both InfiniBand and Ethernet 12 Java app buffer OS Sockets TCP/IP Driver Network Adapter RDMA Socket Context switch #UnifiedAnalytics #SparkAISummit
  • 13. NVIDIA GPUDirect • Direct DMA over PCIe • RDMA devices can write/read directly to/from GPU memory over the network • No CPU overhead • Zero-copy 13#UnifiedAnalytics #SparkAISummit GPUDirect Non-GPUDirect NIC GPU CPU
  • 14. “Smart NIC” – FPGA/ASIC Offloads • FPGA – tailor-made accelerations • ASIC – less flexibility, better performance • Common use cases: – I/O: Serialization, compression, encryption offloads – Data: Aggregation, sorting, group-by, reduce • Deployment options: – Pipeline – Look-aside – Bump-on-the-wire 14#UnifiedAnalytics #SparkAISummit
  • 15. “Smart Switch” • In-network processing – Data reduction during movement – Wire-speed • Generic: MPI Switch Collectives Offloads (e.g. Mellanox SHArP) • Per-workload: Programmable switches (e.g. Barefoot Tofino) – Example: Network-Accelerated Query Processing 15#UnifiedAnalytics #SparkAISummit
  • 16. NVMeOF • Network protocol for NVM express disks (PCIe) • Uses RDMA to provide direct NIC<->Disk access • Completely bypasses the host • Minimal latency differences between local and remote access 16#UnifiedAnalytics #SparkAISummit NVMeOF Traditional NIC CPU
  • 18. Offer ‘Bare Metal’ Experience – Azure HPC Solution #UnifiedAnalytics #SparkAISummit 18 Eliminate Jitter Host holdback is a start, but must completely isolate guest from host Minroot & CPU Groups; separated host and guest VM sandboxes Full Network Experience Enable customers to use Mellanox or OFED drivers Supports all MPI types and versions Leverage hardware offload to Mellanox InfiniBand ASIC Transparent Exposure of Hardware Core N in guest VM should = Core N in silicon 1:1 between physical pNUMA topology and vNUMA topology
  • 19. Latest Azure HPC Offerings – HB/HC HB Series (AMD EPYC) HC Series (Intel Xeon Platinum) Workloads Targets Bandwidth Intensive Compute Intensive Core Count 60 44 System Memory 240 GB 352 GB Network 100 Gbps EDR InfiniBand, 40 Gbps Ethernet Storage Support Standard / Premium Azure Storage, and 700GB Local SSD OS Support for RDMA CentOS/RHEL, Ubuntu, SLES 12, Windows MPI Support OpenMPI, HPC-X, MVAPICH2, MPICH, Intel MPI, PlatformMPI, Microsoft MPI Hardware Collectives Enabled Access Model Azure CLI, ARM template, Azure CycleCloud, Azure Batch, Partner Platform 19#UnifiedAnalytics #SparkAISummit
  • 20. Other Azure HPC Highlights • SR-IOV going broad – All HPC SKUs will support SR-IOV – Driver/SKU Performance Optimizations • GPUs – Latest NDv2 Series • 8 Nvidia Tesla v100 NVLINK interconnected GPUs • Intel Skylake, 672 GB Memory • Excellent platform for HPC and AI workloads • Azure FPGA – Based on Project Brainwave – Deploy model to Azure FPGA, Reconfigure for different models – Supports ResNet 50, ResNet 152, DenseNet-121, and VGG-16 20#UnifiedAnalytics #SparkAISummit
  • 22. MPI Microbenchmarks 22#UnifiedAnalytics #SparkAISummit • Experiments on HC cluster • OSU Benchmarks 5.6.1 • OpenMPI (4.0.0) + UCX (1.5.0) • MPI ranks pinned nearer to HCA 1.77 us 12 GB/s • MPI Latency (4 B) – 1.77us • Getting even better later this year • MPI Bandwidth (4 MB) – 12.06 GB/s 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M Bandwidth(MB/s) Message Size (bytes) MPI Bandwidth Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps) 0 10 20 30 40 50 60 70 80 90 0 1 2 4 8 16 32 64 128 256 512 1K 2K Time(us) Message Size (bytes) MPI Latency Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 23. SparkRDMA • RDMA-powered ShuffleManager plugin for Apache Spark • Similarly spec 8 node cluster: – On-prem: 100GbE RoCE – Cloud: Azure ”h16mr” instances with 56Gbps InfiniBand • https://github.com/Mellanox/SparkRDMA 23#UnifiedAnalytics #SparkAISummit 0 1000 2000 TeraSort 320GB PageRank 19GB On-prem non-RDMA 100GbE On-prem RDMA 100GbE Azure IPoIB 56Gbps Azure RDMA 56Gbps
  • 24. SparkRDMA on Azure • Azure HC cluster: – 100 Gbps InfiniBand – 16 Spark Workers/HDFS DataNodes – Separate NameNode – Data folder hosted on SSD – HiBench Benchmarks (gigantic) • Spark 2.4.0, Hadoop 2.7.7, SparkRDMA 3.1 24#UnifiedAnalytics #SparkAISummit 0 100 200 300 400 500 600 TeraSort - 320 GB PageRank - 19GB Execution Time (s) RDMA (100 Gbps) IPoIB (100 Gbps)
  • 25. HDFS-RDMA on Azure 25#UnifiedAnalytics #SparkAISummit • OSU HDFS RDMA 0.9.1 • Based on Hadoop 3.0.0 • http://hibd.cse.ohio-state.edu/#hadoop3 • HDFS on HC cluster • 1 NameNode • 16 DataNodes • Data folder hosted on SSD • Packet Size: 128KB • Containers per Node: 32 0 50 100 150 200 250 300 350 400 512GB 640GB 768GB 896GB 1TB Time(sec) Size (bytes) TestDFSIO (Write) Execution Time Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 26. Memcached-RDMA on Azure 26#UnifiedAnalytics #SparkAISummit • OSU Memcached RDMA 0.9.6 • Based on Memcached 1.5.3 and libmemcached 1.0.18 • http://hibd.cse.ohio-state.edu/#memcached • Experiment run on HC Nodes • Memcached GET (8 B) Latency – 5.5us • Memcached SET (8 B) Latency – 6.45us 0 20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Latency(us) Message Size (bytes) Memcached GET 0 20 40 60 80 100 120 140 160 180 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Latency(us) Message Size (bytes) Memcached SET Ethernet (40 Gbps) IPoIB (100 Gbps) RDMA (100 Gbps)
  • 27. Kafka-RDMA on Azure 27#UnifiedAnalytics #SparkAISummit • OSU Kafka RDMA 0.9.1 • Based on Apache Kafka 1.0.0 • http://hibd.cse.ohio-state.edu/#kafka • HC cluster • Broker with 100 GB Ramdisk • Record Size – 100 bytes • Number of Records – 500000 0 50 100 150 200 250 300 350 400 Producer Time(s) Kafka Producer Latency IPoIB (100 Gbps) RDMA (100 Gbps) 0 10 20 30 40 50 60 70 Producer Bandwidth(MB/s) Kafka Producer Bandwidth IPoIB (100 Gbps) RDMA (100 Gbps)
  • 28. Horovod on Azure 28#UnifiedAnalytics #SparkAISummit • Tensorflow 1.13 – ResNet-50 Training – Partial ImageNet Data – Batch Size = 64 per worker – 2 workers per node – Total batches 100 – CPU only version • HC Cluster – OpenMPI 4.0 + UCX 1.5 – Singularity container • ~97% Scaling efficiency 100.00 96.78 95.58 94.93 100.00 98.86 98.37 96.94 50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00 0 200 400 600 800 1000 1200 1400 1600 2 4 8 16 %Efficiency Images/second # nodes IPoIB (100 Gbps) RDMA (100 Gbps) IPoIB Efficiency RDMA Efficiency
  • 30. What’s available on major clouds? Technology Azure AWS GCP Network speeds 100Gbps 100Gbps 20Gbps? InfiniBand ✔ ! ! RDMA ✔ (limited) ! GPUDirect ! (single host) ! Smart NIC ! ! ! Smart Switch ! ! ! NVMeOF ! ! ! 30#UnifiedAnalytics #SparkAISummit
  • 31. Take-aways • Accelerated Frameworks: – SparkRDMA on GitHub – High Performance Big Data (From OSU) – Horovod • Azure instances – Azure HPC HB/HC – Azure NDv2 GPUs – Azure FPGA 31#UnifiedAnalytics #SparkAISummit
  • 33. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT