SlideShare a Scribd company logo
Software AI Accelerators
T h e N e x t F r o n t i e r
S o f t w a r e f o r A I O p t i m i z a t i o n S u m m i t
W e i L i
V P & G M M a c h i n e L e a r n i n g P e r f o r m a n c e
I n t e l C o r p o r a t i o n
2
HARDWARE AI ACCELERATORS
HW Acceleration
10 - 100x
SOFTWARE AI ACCELERATORS
3
Up to
HW Acceleration
With SW
Acceleration
Photo Source: NASA
AI HARDWARE SPECTRUM
4
GENERAL PURPOSE PURPOSE BUILT
GPU ACCELERATORS
CPU
UNSCALABLE TO SCALABLE SOFTWARE
5
Services & Solutions
Applications
M i d d l e w a r e
F r a m e w o r k s
A n d R u n t i m e s
L o w L e v e l
L i b r a r i e s
V i r t u a l i z a t i o n /
O r c h e s t r a t i o n
O S
D r i v e r s
F W I P & B I O S
M i d d l e w a r e
F r a m e w o r k s
A n d R u n t i m e s
L o w L e v e l
L i b r a r i e s
V i r t u a l i z a t i o n /
O r c h e s t r a t i o n
O S
D r i v e r s
F W I P & B I O S
M i d d l e w a r e
F r a m e w o r k s
A n d R u n t i m e s
L o w L e v e l
L i b r a r i e s
V i r t u a l i z a t i o n /
O r c h e s t r a t i o n
O S
D r i v e r s
F W I P & B I O S
M i d d l e w a r e
F r a m e w o r k s
A n d R u n t i m e s
L o w L e v e l
L i b r a r i e s
V i r t u a l i z a t i o n /
O r c h e s t r a t i o n
O S
D r i v e r s
F W I P & B I O S
…
GPU ACCELER AT O R
[1]
CPU ACCELER AT O R
[N]
Services & Solutions
Applications
Middleware, Frameworks and Runtimes
GPU ACCELERATORS
CPU
AI SOFTWARE STACK
6
Data Scientists &
Developers
AI/Analytics
Tools, Toolkits,
Verticals
Deep Learning,
Machine Learning,
Big Data
Frameworks
Libraries &
Compilers
HW
Intel® LPOT
( L o w p r e c i s i o n
o p t i m i z a t i o n t o o l )
Analyt i cs
Zoo
Intel®
oneAPI AI
Analyt i cs
Toolkit
SigOpt
P
a
d
d
l
e
P
a
d
d
l
e
T
e
n
s
o
r
F
l
o
w
P
y
t
h
o
n
/
N
u
m
b
a
TVM
P
y
T
o
r
c
h
M
X
N
e
t
S
p
a
r
k
S
Q
L
+
M
L
/
D
L
s
c
a
l
e
o
u
t
M
o
d
i
n
NumPy
X
G
-
B
o
o
s
t
S
c
i
k
i
t
-
L
e
a
r
n
P
a
n
d
a
s
O
p
e
n
V
I
N
O
GPU ACCELERATORS
CPU
KERNEL OPTIMIZATION EXAMPLE
7
Optimizations: vectorization, data reuse, parallelization
Optimized convolution in oneDNN
A simple program is good, but may be slow
GRAPH OPTIMIZATION EXAMPLE
8
Baseline
S u m
R e L U
C o n v 1 x 1
B a t c h N o r m
R e L U
C o n v 3 x 3
B a t c h N o r m
R e L U
C o n v 1 x 1
R e L U
S u m
R e L U
C o n v 1 x 1
B a t c h N o r m
INT8 Optimized Model (generated by Intel Lo w Precision Optimization To o l)
BN Folding Conv + ReLU Conv + Sum
S u m
R e L U
C o n v 1 x 1 ’
R e L U
C o n v 3 x 3 ’
R e L U
C o n v 1 x 1 ’
S u m
R e L U
C o n v 1 x 1 ’
Sum’
Conv1x1’’
Conv3x3’’
Conv1x1’’
Sum’
Conv1x1’’
Sum’
Conv1x1’’
Conv3x3’’
Conv1x1’’’
Conv1x1’’
A0
B0
A1
B1
A2
B2
A3
B3
…
…
A63
B63
C0
A0 *B0 + A1 *B1+A2
*B2+A2 *B2+C0
…
…
C15
A60 *B60 + A61 *B61+A62
*B62+A63 *B63+C015
Intel Optimization for TENSORFLOW
9
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/master/benchmarks/image_recognition/tensorflow/resnet50v1_5/README.md; SSD-MobileNetv1, FP32/INT8, BS=448,
https://github.com/IntelAI/models/blob/master/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md. Software: Tensorflow 2.4.0 for FP32 & Intel-Tensorflow (icx-base) for both FP32 and INT8, test by Intel
on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
Intel Optimization for TENSORFLOW
10
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/master/benchmarks/image_recognition/tensorflow/resnet50v1_5/README.md; SSD-MobileNetv1, FP32/INT8, BS=448,
https://github.com/IntelAI/models/blob/master/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md. Software: Tensorflow 2.4.0 for FP32 & Intel-Tensorflow (icx-base) for both FP32 and INT8, test by Intel
on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
Intel Optimization for PYTORCH
11
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/icx-launch-public/quickstart/ipex-bkc/resnet50-icx/inference; DLRM, FP32/INT8, BS=16, https://github.com/IntelAI/models/blob/icx-launch-
public/quickstart/ipex-bkc/dlrm-icx/inference/fp32/README.md. Software: PyTorch v1.5 w/o DNNL build for FP32 & PyTorch v1.5 + IPEX (icx) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For
workloads and configurations visit www.Intel.com/PerformanceIndex.
Intel Optimization for PYTORCH
12
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/icx-launch-public/quickstart/ipex-bkc/resnet50-icx/inference; DLRM, FP32/INT8, BS=16, https://github.com/IntelAI/models/blob/icx-launch-
public/quickstart/ipex-bkc/dlrm-icx/inference/fp32/README.md. Software: PyTorch v1.5 w/o DNNL build for FP32 & PyTorch v1.5 + IPEX (icx) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For
workloads and configurations visit www.Intel.com/PerformanceIndex.
Photo Source: NASA
Intel Optimization for MXNET
13
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1, FP32/INT8, BS=128, https://github.com/apache/incubator-mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/resnet.py; MobileNetv2, FP32/INT8, BS=128, https://github.com/apache/incubator-
mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/mobilenet.py. Software: MXNet 2.0.0.alpha w/o DNNL build for FP32 & MXNet 2.0.0.alpha for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary.
For workloads and configurations visit www.Intel.com/PerformanceIndex.
Intel Optimization for MXNET
14
IMMEDIATE PERFORMANCE BENEFITS
Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive;
ResNet50 v1, FP32/INT8, BS=128, https://github.com/apache/incubator-mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/resnet.py; MobileNetv2, FP32/INT8, BS=128, https://github.com/apache/incubator-
mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/mobilenet.py. Software: MXNet 2.0.0.alpha w/o DNNL build for FP32 & MXNet 2.0.0.alpha for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary.
For workloads and configurations visit www.Intel.com/PerformanceIndex.
Photo Source: NASA
Intel Extension for Scikit-learn
15
Intel Xeon Platinum 8276L CPU @ 2.20 GHz, 2 sockets, 28 cores per socket; For workloads and configurations visit www.Intel.com/PerformanceIndex.
Details: https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912
PERFORMANCE IN KAGGLE COMPETITIONS
16
Kaggle challenge Domain Algorithm(s)
Stock E2E Time
(minutes)
Intel Extension for
Scikit-learn
E2E Time (minutes)
Speed up
KDD Cup 1999 Computer Networks kNN 282 1.24 227.4x
Credit Card Default Finance SVC 11.9 0.2 59.5x
Digit Recognizer (KNN) Image Classification SVC 84.32 1.47 57.5x
Melanoma Identification Image Classification kNN 99.89 2.08 48x
Digit Recognizer (SVM) Image Classification PCA, SVC 125.5 4.92 25.5x
What's cooking?
Natural Language
Processing
SVC,
XGBoost
35.8 2.66 13.5x
Real or Not? Disaster Tweets
Natural Language
Processing
SVC 37.8 4.27 8.9x
Home Credit Default Finance
Random
Forest
2.9 1.44 2x
Intel Xeon Gold 5218 @ 2.3 GHz (2nd generation Intel Xeon Scalable processors): 2 sockets, 16 cores per socket, HT:off, Turbo:off. For workloads and configurations visit www.Intel.com/PerformanceIndex.
Details: https://medium.com/intel-analytics-software/accelerate-kaggle-challenges-using-intel-ai-analytics-toolkit-beb148f66d5a
GRAPH ANALYTICS WITH oneDAL
17
Triangle Counting Algorithm
V = Vertices, E = Edges, speed up due to relabel in g
1.38 1.67 1.74 1.82
2.98
8.02
166.1
1
10
100
1000
Enron
(V: 0.03M, E: 0.4M)
Pokec
(V: 1.6M, E: 30.6M)
Google
(V: 0.9M, E: 5.1M)
Indochina-2004
(V: 7.4M, E: 151M)
Wikipedia
(V: 12.1M, E: 378M)
Twitter
(V: 61M, E: 1202M)
Web
(V: 50M, E: 1810M)
Speed
Up
Data Sets
Enron
(V: 0.03M, E: 0.4M)
Pokec
(V: 1.6M, E: 30.6M)
Google
(V: 0.9M, E: 5.1M)
Indochina-2004
(V: 7.4M, E: 151M)
Wikipedia
(V: 12.1M, E: 378M)
Twitter
(V: 61M, E: 1202M)
Web
(V: 50M, E: 1810M)
Intel Xeon Platinum 8280 CPU @ 2.70 GHz, 2x28 cores, HT: on; For workloads and configurations visit www.Intel.com/PerformanceIndex.
Data sets: https://gihub.com/sbeamer/gapbs | https://snap.Stanford.edu/data
E2E WORKLOAD PERFORMANCE
18
R e a d c s v E T L T r a i n T e s t S p l i t M L
0
10
20
30
40
50
60
70
80
90
100
Readcsv ETL Train Test Split ML Total Time
Speed
up
Unoptimized Software Optimized Optimized hyperparameters
CENSUS Phase-wise % breakdown CENSUS Performance improvement with hyperparameter optimizations
Readcsv ETL ML
PLAsTiCC Phase-wise % breakdown
PLAsTiCC Performance improvement with hyperparameter optimizations
23x
0
10
20
30
40
50
60
70
Readcsv ETL ML Total Time
Speed
up
Unoptimized Software Optimized Optimized hyperparameters
29x
Higher is
better
Details: https://medium.com/intel-analytics-software/performance-optimizations-for-end-to-end-ai-pipelines-231e0966505a
Intel® Xeon Platinum 8280L @ 28 cores; For workloads and configurations visit www.Intel.com/PerformanceIndex.
AI APPLICATIONS FROM PARTNERSHIPS
19
Athlete Training Telecom Network Quality Drug Discovery
SUMMARY AND CALL-TO-ACTION
20
Software AI Accelerators can deliver orders of magnitude
performance
Even more potential for the AI software community
▪ Create compiler technologies to automate kernel optimizations
▪ Increase parallelism to achieve higher compute utilization
▪ Optimize for memory bandwidth, memory size, NUMA
▪ Scale to large distributed compute
Find more at: ai.intel.com
NOTICES & DISCLAIMERS
21
▪ Results have been estimated or simulated.
▪ Performance varies by use, configuration and other factors. Learn more at
www.Intel.com/PerformanceIndex​.
▪ Performance results are based on testing as of dates shown in configurations and may not reflect
all publicly available ​updates. See backup for configuration details. No product or component
can be absolutely secure.
▪ Your costs and results may vary.
▪ Intel technologies may require enabled hardware, software or service activation.
▪ All product plans and roadmaps are subject to change without notice.
▪ Intel contributes to the development of benchmarks by participating in, sponsoring, and/or
contributing technical support to various benchmarking groups, including the BenchmarkXPRT
Development Community administered by Principled Technologies.
▪ © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel
Corporation or its subsidiaries. Other names and brands may be claimed as the property of
others. ​
Ad

More Related Content

What's hot (20)

7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
AMD
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
LINE Corporation
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
Memory Fabric Forum
 
Accelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to CloudAccelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to Cloud
Rebekah Rodriguez
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
AMD
 
CXL Controller by Montage Technology
CXL Controller by Montage TechnologyCXL Controller by Montage Technology
CXL Controller by Montage Technology
Memory Fabric Forum
 
Tc basics
Tc basicsTc basics
Tc basics
jeromy fu
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
MuntasirMuhit
 
TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPU
Te-Yen Liu
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
AMD
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門Zynq+PyCoRAM(+Debian)入門
Zynq+PyCoRAM(+Debian)入門
Shinya Takamaeda-Y
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
AMD
 
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data ConvergenceHigh Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
inside-BigData.com
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Memory Fabric Forum
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
Denys Haryachyy
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
Kernel TLV
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
Linaro
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
AMD
 
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
ソフトウェアでのパケット処理あれこれ〜何故我々はロードバランサを自作するに至ったのか〜
LINE Corporation
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
Memory Fabric Forum
 
Accelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to CloudAccelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to Cloud
Rebekah Rodriguez
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
AMD
 
CXL Controller by Montage Technology
CXL Controller by Montage TechnologyCXL Controller by Montage Technology
CXL Controller by Montage Technology
Memory Fabric Forum
 
Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)Presentation on graphics processing unit (GPU)
Presentation on graphics processing unit (GPU)
MuntasirMuhit
 
TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPU
Te-Yen Liu
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
AMD
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
AMD
 
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data ConvergenceHigh Performance Data Analysis (HPDA): HPC - Big Data Convergence
High Performance Data Analysis (HPDA): HPC - Big Data Convergence
inside-BigData.com
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Memory Fabric Forum
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
Linaro
 

Similar to Software AI Accelerators: The Next Frontier | Software for AI Optimization Summit 2021 Keynote (20)

Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
DESMOND YUEN
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Newprolab
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
51 lecture
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
inside-BigData.com
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Databricks
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*
Intel® Software
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
Nvidia SC13 Podcast
Nvidia SC13 PodcastNvidia SC13 Podcast
Nvidia SC13 Podcast
inside-BigData.com
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor Graphics
Intel® Software
 
The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11
DESMOND YUEN
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Intel® Software
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Intel® Software
 
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
Edge AI and Vision Alliance
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
Infoshare
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
MAKERPRO.cc
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
inside-BigData.com
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
ukdpe
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel® Software
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
DESMOND YUEN
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Databricks
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Newprolab
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
inside-BigData.com
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Databricks
 
Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*Forts and Fights Scaling Performance on Unreal Engine*
Forts and Fights Scaling Performance on Unreal Engine*
Intel® Software
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
The Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor GraphicsThe Architecture of 11th Generation Intel® Processor Graphics
The Architecture of 11th Generation Intel® Processor Graphics
Intel® Software
 
The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11The Architecture of Intel Processor Graphics: Gen 11
The Architecture of Intel Processor Graphics: Gen 11
DESMOND YUEN
 
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Intel® Software
 
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® GraphicsGetting Space Pirate Trainer* to Perform on Intel® Graphics
Getting Space Pirate Trainer* to Perform on Intel® Graphics
Intel® Software
 
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
“Getting Efficient DNN Inference Performance: Is It Really About the TOPS?,” ...
Edge AI and Vision Alliance
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
Infoshare
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
MAKERPRO.cc
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
inside-BigData.com
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
ukdpe
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on IntelOptimization Deep Dive: Unreal Engine 4 on Intel
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel® Software
 
Ad

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Intel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
Intel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
Intel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
Intel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
Intel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
Intel® Software
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Intel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
Intel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
Intel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
Intel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
Intel® Software
 
Ad

Recently uploaded (20)

FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
How to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud PerformanceHow to Optimize Your AWS Environment for Improved Cloud Performance
How to Optimize Your AWS Environment for Improved Cloud Performance
ThousandEyes
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
What Do Contribution Guidelines Say About Software Testing? (MSR 2025)
Andre Hora
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 

Software AI Accelerators: The Next Frontier | Software for AI Optimization Summit 2021 Keynote

  • 1. Software AI Accelerators T h e N e x t F r o n t i e r S o f t w a r e f o r A I O p t i m i z a t i o n S u m m i t W e i L i V P & G M M a c h i n e L e a r n i n g P e r f o r m a n c e I n t e l C o r p o r a t i o n
  • 3. 10 - 100x SOFTWARE AI ACCELERATORS 3 Up to HW Acceleration With SW Acceleration Photo Source: NASA
  • 4. AI HARDWARE SPECTRUM 4 GENERAL PURPOSE PURPOSE BUILT GPU ACCELERATORS CPU
  • 5. UNSCALABLE TO SCALABLE SOFTWARE 5 Services & Solutions Applications M i d d l e w a r e F r a m e w o r k s A n d R u n t i m e s L o w L e v e l L i b r a r i e s V i r t u a l i z a t i o n / O r c h e s t r a t i o n O S D r i v e r s F W I P & B I O S M i d d l e w a r e F r a m e w o r k s A n d R u n t i m e s L o w L e v e l L i b r a r i e s V i r t u a l i z a t i o n / O r c h e s t r a t i o n O S D r i v e r s F W I P & B I O S M i d d l e w a r e F r a m e w o r k s A n d R u n t i m e s L o w L e v e l L i b r a r i e s V i r t u a l i z a t i o n / O r c h e s t r a t i o n O S D r i v e r s F W I P & B I O S M i d d l e w a r e F r a m e w o r k s A n d R u n t i m e s L o w L e v e l L i b r a r i e s V i r t u a l i z a t i o n / O r c h e s t r a t i o n O S D r i v e r s F W I P & B I O S … GPU ACCELER AT O R [1] CPU ACCELER AT O R [N] Services & Solutions Applications Middleware, Frameworks and Runtimes GPU ACCELERATORS CPU
  • 6. AI SOFTWARE STACK 6 Data Scientists & Developers AI/Analytics Tools, Toolkits, Verticals Deep Learning, Machine Learning, Big Data Frameworks Libraries & Compilers HW Intel® LPOT ( L o w p r e c i s i o n o p t i m i z a t i o n t o o l ) Analyt i cs Zoo Intel® oneAPI AI Analyt i cs Toolkit SigOpt P a d d l e P a d d l e T e n s o r F l o w P y t h o n / N u m b a TVM P y T o r c h M X N e t S p a r k S Q L + M L / D L s c a l e o u t M o d i n NumPy X G - B o o s t S c i k i t - L e a r n P a n d a s O p e n V I N O GPU ACCELERATORS CPU
  • 7. KERNEL OPTIMIZATION EXAMPLE 7 Optimizations: vectorization, data reuse, parallelization Optimized convolution in oneDNN A simple program is good, but may be slow
  • 8. GRAPH OPTIMIZATION EXAMPLE 8 Baseline S u m R e L U C o n v 1 x 1 B a t c h N o r m R e L U C o n v 3 x 3 B a t c h N o r m R e L U C o n v 1 x 1 R e L U S u m R e L U C o n v 1 x 1 B a t c h N o r m INT8 Optimized Model (generated by Intel Lo w Precision Optimization To o l) BN Folding Conv + ReLU Conv + Sum S u m R e L U C o n v 1 x 1 ’ R e L U C o n v 3 x 3 ’ R e L U C o n v 1 x 1 ’ S u m R e L U C o n v 1 x 1 ’ Sum’ Conv1x1’’ Conv3x3’’ Conv1x1’’ Sum’ Conv1x1’’ Sum’ Conv1x1’’ Conv3x3’’ Conv1x1’’’ Conv1x1’’ A0 B0 A1 B1 A2 B2 A3 B3 … … A63 B63 C0 A0 *B0 + A1 *B1+A2 *B2+A2 *B2+C0 … … C15 A60 *B60 + A61 *B61+A62 *B62+A63 *B63+C015
  • 9. Intel Optimization for TENSORFLOW 9 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/master/benchmarks/image_recognition/tensorflow/resnet50v1_5/README.md; SSD-MobileNetv1, FP32/INT8, BS=448, https://github.com/IntelAI/models/blob/master/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md. Software: Tensorflow 2.4.0 for FP32 & Intel-Tensorflow (icx-base) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
  • 10. Intel Optimization for TENSORFLOW 10 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/master/benchmarks/image_recognition/tensorflow/resnet50v1_5/README.md; SSD-MobileNetv1, FP32/INT8, BS=448, https://github.com/IntelAI/models/blob/master/benchmarks/object_detection/tensorflow/ssd-mobilenet/README.md. Software: Tensorflow 2.4.0 for FP32 & Intel-Tensorflow (icx-base) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
  • 11. Intel Optimization for PYTORCH 11 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/icx-launch-public/quickstart/ipex-bkc/resnet50-icx/inference; DLRM, FP32/INT8, BS=16, https://github.com/IntelAI/models/blob/icx-launch- public/quickstart/ipex-bkc/dlrm-icx/inference/fp32/README.md. Software: PyTorch v1.5 w/o DNNL build for FP32 & PyTorch v1.5 + IPEX (icx) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
  • 12. Intel Optimization for PYTORCH 12 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1.5, FP32/INT8, BS=128, https://github.com/IntelAI/models/blob/icx-launch-public/quickstart/ipex-bkc/resnet50-icx/inference; DLRM, FP32/INT8, BS=16, https://github.com/IntelAI/models/blob/icx-launch- public/quickstart/ipex-bkc/dlrm-icx/inference/fp32/README.md. Software: PyTorch v1.5 w/o DNNL build for FP32 & PyTorch v1.5 + IPEX (icx) for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex. Photo Source: NASA
  • 13. Intel Optimization for MXNET 13 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1, FP32/INT8, BS=128, https://github.com/apache/incubator-mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/resnet.py; MobileNetv2, FP32/INT8, BS=128, https://github.com/apache/incubator- mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/mobilenet.py. Software: MXNet 2.0.0.alpha w/o DNNL build for FP32 & MXNet 2.0.0.alpha for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex.
  • 14. Intel Optimization for MXNET 14 IMMEDIATE PERFORMANCE BENEFITS Platinum 8380: 1-node, 2x Intel Xeon Platinum 8380 processor with 1 TB (16 slots/ 64GB/3200) total DDR4 memory, ucode 0xd000280, HT on, Turbo on, Ubuntu 20.04.1 LTS, 5.4.0-73-generic1, Intel 900GB SSD OS Drive; ResNet50 v1, FP32/INT8, BS=128, https://github.com/apache/incubator-mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/resnet.py; MobileNetv2, FP32/INT8, BS=128, https://github.com/apache/incubator- mxnet/blob/v2.0.0.alpha/python/mxnet/gluon/model_zoo/vision/mobilenet.py. Software: MXNet 2.0.0.alpha w/o DNNL build for FP32 & MXNet 2.0.0.alpha for both FP32 and INT8, test by Intel on 5/12/2021. Results may vary. For workloads and configurations visit www.Intel.com/PerformanceIndex. Photo Source: NASA
  • 15. Intel Extension for Scikit-learn 15 Intel Xeon Platinum 8276L CPU @ 2.20 GHz, 2 sockets, 28 cores per socket; For workloads and configurations visit www.Intel.com/PerformanceIndex. Details: https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912
  • 16. PERFORMANCE IN KAGGLE COMPETITIONS 16 Kaggle challenge Domain Algorithm(s) Stock E2E Time (minutes) Intel Extension for Scikit-learn E2E Time (minutes) Speed up KDD Cup 1999 Computer Networks kNN 282 1.24 227.4x Credit Card Default Finance SVC 11.9 0.2 59.5x Digit Recognizer (KNN) Image Classification SVC 84.32 1.47 57.5x Melanoma Identification Image Classification kNN 99.89 2.08 48x Digit Recognizer (SVM) Image Classification PCA, SVC 125.5 4.92 25.5x What's cooking? Natural Language Processing SVC, XGBoost 35.8 2.66 13.5x Real or Not? Disaster Tweets Natural Language Processing SVC 37.8 4.27 8.9x Home Credit Default Finance Random Forest 2.9 1.44 2x Intel Xeon Gold 5218 @ 2.3 GHz (2nd generation Intel Xeon Scalable processors): 2 sockets, 16 cores per socket, HT:off, Turbo:off. For workloads and configurations visit www.Intel.com/PerformanceIndex. Details: https://medium.com/intel-analytics-software/accelerate-kaggle-challenges-using-intel-ai-analytics-toolkit-beb148f66d5a
  • 17. GRAPH ANALYTICS WITH oneDAL 17 Triangle Counting Algorithm V = Vertices, E = Edges, speed up due to relabel in g 1.38 1.67 1.74 1.82 2.98 8.02 166.1 1 10 100 1000 Enron (V: 0.03M, E: 0.4M) Pokec (V: 1.6M, E: 30.6M) Google (V: 0.9M, E: 5.1M) Indochina-2004 (V: 7.4M, E: 151M) Wikipedia (V: 12.1M, E: 378M) Twitter (V: 61M, E: 1202M) Web (V: 50M, E: 1810M) Speed Up Data Sets Enron (V: 0.03M, E: 0.4M) Pokec (V: 1.6M, E: 30.6M) Google (V: 0.9M, E: 5.1M) Indochina-2004 (V: 7.4M, E: 151M) Wikipedia (V: 12.1M, E: 378M) Twitter (V: 61M, E: 1202M) Web (V: 50M, E: 1810M) Intel Xeon Platinum 8280 CPU @ 2.70 GHz, 2x28 cores, HT: on; For workloads and configurations visit www.Intel.com/PerformanceIndex. Data sets: https://gihub.com/sbeamer/gapbs | https://snap.Stanford.edu/data
  • 18. E2E WORKLOAD PERFORMANCE 18 R e a d c s v E T L T r a i n T e s t S p l i t M L 0 10 20 30 40 50 60 70 80 90 100 Readcsv ETL Train Test Split ML Total Time Speed up Unoptimized Software Optimized Optimized hyperparameters CENSUS Phase-wise % breakdown CENSUS Performance improvement with hyperparameter optimizations Readcsv ETL ML PLAsTiCC Phase-wise % breakdown PLAsTiCC Performance improvement with hyperparameter optimizations 23x 0 10 20 30 40 50 60 70 Readcsv ETL ML Total Time Speed up Unoptimized Software Optimized Optimized hyperparameters 29x Higher is better Details: https://medium.com/intel-analytics-software/performance-optimizations-for-end-to-end-ai-pipelines-231e0966505a Intel® Xeon Platinum 8280L @ 28 cores; For workloads and configurations visit www.Intel.com/PerformanceIndex.
  • 19. AI APPLICATIONS FROM PARTNERSHIPS 19 Athlete Training Telecom Network Quality Drug Discovery
  • 20. SUMMARY AND CALL-TO-ACTION 20 Software AI Accelerators can deliver orders of magnitude performance Even more potential for the AI software community ▪ Create compiler technologies to automate kernel optimizations ▪ Increase parallelism to achieve higher compute utilization ▪ Optimize for memory bandwidth, memory size, NUMA ▪ Scale to large distributed compute Find more at: ai.intel.com
  • 21. NOTICES & DISCLAIMERS 21 ▪ Results have been estimated or simulated. ▪ Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​. ▪ Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure. ▪ Your costs and results may vary. ▪ Intel technologies may require enabled hardware, software or service activation. ▪ All product plans and roadmaps are subject to change without notice. ▪ Intel contributes to the development of benchmarks by participating in, sponsoring, and/or contributing technical support to various benchmarking groups, including the BenchmarkXPRT Development Community administered by Principled Technologies. ▪ © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. ​