SlideShare a Scribd company logo
Practical Approaches
to DNN Quantization
Dwith Chenna
Senior Embedded DSP Eng., Computer Vision
Magic Leap Inc.
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
3
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Quantization is a powerful tool to enable deep learning on edge
devices
• Resource constrained hardware with limited memory and low
power requirement
Why Quantization?
4
© 2023 Magic Leap
• Model compression: Up to 4x smaller (float32 to int8) network size
and memory bandwidth
• Latency reduction: Up to 2x-3x times, int8 compute is significantly
faster compared to float32 [1]
• Trade-off: Potential effects on the model accuracy
Why Quantization?
5
© 2023 Magic Leap
• Convert full precision float-point numbers to int8 [2]
q - quantized value, r - real value, s - scale, z - zero point
• Quantized value to float-point representation
• In case of float-point distribution, we obtain scale and zero point as:
Quantization Scheme
6
© 2023 Magic Leap
• Assumes symmetric distribution for
simplicity, zero point = 0
• Symmetric per tensor
• Calculate scale for the entire tensor
• Symmetric per channel
• Calculate scale for each channel of the
tensor
• Computationally efficient
Quantization Scheme: Symmetric
7
© 2023 Magic Leap
• Accounts for shifts in the distribution, better
utilization of quantization range
• Asymmetric per tensor
• Scale and zero point for the entire tensor
• Symmetric per channel
• Scale and zero points for each channel of
the tensor
• Better handling of diverse distributions
Quantization Scheme: Asymmetric
8
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
9
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Post Training Quantization (PTQ)
• Simple yet efficient
• Uses already trained model and calibration dataset
• Quantization Aware Training (QAT)
• Emulates inference-time quantization
• Resource intensive as it needs retraining
Types of Quantization
10
© 2023 Magic Leap
• Dynamic Quantization
• Weights are quantized ahead of time
• Activations are quantized during inference (dynamic)
• Static Quantization
• Weights and activations are quantized
• Memory bandwidth and compute savings
• Needs representative dataset
Post Training Quantization
11
© 2023 Magic Leap
• Best quantization scheme for deep neural
networks?
• Weights: Symmetric per channel
• Static distribution makes it easy for
quantization
• Weight distributions tend to be
symmetric [3]
• Symmetric per channel handles
diversity in weight distribution
Post Training Quantization
12
© 2023 Magic Leap
Empirical distribution in a pre-trained network
• Activations: Asymmetric/Symmetric per tensor
• Dynamic distribution per inference makes it difficult to find
statistics
• Approximation through representative/calibration dataset
• Batch normalization enables better distributions for quantization
Post Training Quantization
13
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
14
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Tflite supports 8-bit integer PTQ [1]
• Quantization scheme
• Weights: Symmetric per channel
• Activations: Asymmetric per tensor
• Quantization analysis
• Selective quantization with mixed precision (float32/16 + int8/int16)
• Layerwise quantization error with custom metrics
Quantization Tools: Tflite
15
© 2023 Magic Leap
• Pytorch supports 8-bit integer PTQ [4]
• Quantization scheme
• Weights: (A)symmetric per tensor/channel
• Activations: (A)symmetric per tensor/channel
• Quantization analysis
• Layerwise quantization error through custom metrics
Quantization Tools: Pytorch
16
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
17
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• FLOPS is not everything!
• Network Architecture Search (NAS)
• Most NAS based models (e.g., efficientNet) try to minimize
compute
• Results in deeper and leaner network that works well with
cache-based systems
Network Architecture
18
© 2023 Magic Leap
• Efficient architecture for quantization [5]
Network Architecture
19
© 2023 Magic Leap
• Quantization aware
• Larger models have redundancy which enables robustness to
quantization
• Quantization scheme
• Enable utilization of simpler and efficient quantization
schemes
Network Architecture
20
© 2023 Magic Leap
• Optimization tool chain
• Aggressive layer fusion for optimal memory bandwidth
• Optimal quantization parameter selection
• Hardware
• Better suited for the hardware CPU/GPU/DSP/accelerator
Network Architecture
21
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
22
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Representative dataset to estimate activation distribution
• Need to address diversity of the use case
• Size: ~100-1000 images are statistically significant [6]
Calibration Dataset
23
© 2023 Magic Leap
• Minimize quantization error and eliminate outliers
• Trade-offs: range vs quantization error
• Mean/Standard deviation
• Assuming normal distribution
• Min/Max: mean +/- 3*STD
Min/Max Tuning
24
© 2023 Magic Leap
• Histogram
• Ignore the last x% percent
• Moving average (TensorFlow default)
• Search max/min (Pytorch/TensorRT)
• Find histogram to cover most entropy
Min/Max Tuning
25
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
26
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Evaluate best fit quantization schemes to the model [2]
• ResNet50: Symmetric per tensor
• MobileNet: Asymmetric per channel
Quantization Evaluation
27
© 2023 Magic Leap
• Effects of quantization scheme on model accuracy [2]
• Classification accuracy of the quantized model
Quantization Evaluation
28
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
29
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• What to do when quantization fails?
• Individual layer support for quantization
• Identifying few problematic layers will significantly improve performance
• Common pitfalls
• Handling input/output quantization
• Layer fusion before quantization
Quantization Analysis
30
© 2023 Magic Leap
• Analyse individual layers sensitivity to quantization [1]
• Selective quantization: mixed precision inference for testing
Quantization Analysis
31
© 2023 Magic Leap
There are many layers with wide ranges, and some layers have high rmse/scale values
layer number (x-axis) vs activation range (y-axis) root mean square error (rmse) vs activation range
• Non-Linear activations: precision requirement and quantization support
• ReLU/ReLU6 preferred over Sigmoid/LeakyReLU
• Weight/activation distribution: visualization or metrics for data
distribution, i.e., range
• Layer fusion Conv + BN + ReLU / Conv + BN / Conv + ReLU before
quantization
Quantization Analysis
32
© 2023 Magic Leap
• Use larger bit width for more sensitive layers, i.e., fully connected,
network head
• Int16 activation support in tflite
• Min/Max tuning: Outlier weights that cause all other weights to be less
precise
Quantization Analysis
33
© 2023 Magic Leap
• Large difference in weight values for different output channels: more
quantization error
• Asymmetric/Symmetric per channel quantization
• Weight equalization techniques to minimize the variation [7]
Quantization Analysis
34
© 2023 Magic Leap
• Weight equalization makes model quantization friendly
Quantization Analysis
35
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
36
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• When everything else fails!
• QAT is a fine-tuning process
• Start with trained floating-point model: with reduced momentum and
learning rate
Quantization Aware Training
37
© 2023 Magic Leap
• Inserting quantization nodes
during training [8]
• Simulate quantization using
float-point operations
• Tune quantization parameters
during training
Quantization Aware Training
38
© 2023 Magic Leap
• QAT is able to achieve < 1% accuracy degradation w.r.t to
floating-point inference [2]
Quantization Aware Training
39
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
40
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Model selection
• NAS: Efficient architecture for quantization
• Quantization tools
• Support for quantization schemes and analysis tools
• Calibration dataset
• Representative dataset with ~100-1000 samples
Best Practices
41
© 2023 Magic Leap
• Quantization accuracy
• Evaluate best-fit quantization scheme for the model
• Quantization analysis
• Identify potentially problematic layers
• Quantization aware training
• Fine tune model for quantization
Best Practices
42
© 2023 Magic Leap
1. https://www.tensorflow.org/lite/performance/post_training_quantization
2. Quantizing deep convolutional networks for efficient inference: A whitepaper [link]
3. Fixed Point Quantization of Deep Convolutional Networks [link]
4. https://pytorch.org/docs/stable/quantization.html
5. https://deci.ai/resources/achieve-fp32-accuracy-int8-inference-speed/
6. SelectQ: Calibration Data Selection for Post-Training Quantization[link]
7. AI Model Efficiency Toolkit (AIMET) [link]
8. Aspects and best practices of quantization aware training for custom network
accelerators [link]
References
43
© 2023 Magic Leap
Ad

More Related Content

What's hot (20)

A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
Koan-Sin Tan
 
Review of QNX
Review of QNXReview of QNX
Review of QNX
Robert-Emmanuel Mayssat
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
Antonios Katsarakis
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Renee Yao
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
HANCOM MDS
 
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
Edge AI and Vision Alliance
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
John Ramey
 
Quantum Computing and Quantum Supremacy at Google
Quantum Computing and Quantum Supremacy at GoogleQuantum Computing and Quantum Supremacy at Google
Quantum Computing and Quantum Supremacy at Google
inside-BigData.com
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
Grigory Sapunov
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
 
Machine Learning & Predictive Maintenance
Machine Learning &  Predictive MaintenanceMachine Learning &  Predictive Maintenance
Machine Learning & Predictive Maintenance
Arnab Biswas
 
Accelerated Training of Transformer Models
Accelerated Training of Transformer ModelsAccelerated Training of Transformer Models
Accelerated Training of Transformer Models
Databricks
 
Flutter + tensor flow lite = awesome sauce
Flutter + tensor flow lite = awesome sauceFlutter + tensor flow lite = awesome sauce
Flutter + tensor flow lite = awesome sauce
Amit Sharma
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Edureka!
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
Memory Fabric Forum
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
Qualcomm Research
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Purnima Pandit
 
Quantum computing meghaditya
Quantum computing meghadityaQuantum computing meghaditya
Quantum computing meghaditya
Meghaditya Roy Chaudhury
 
A Peek into Google's Edge TPU
A Peek into Google's Edge TPUA Peek into Google's Edge TPU
A Peek into Google's Edge TPU
Koan-Sin Tan
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
 
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX SystemsSimplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems
Renee Yao
 
한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32한컴MDS_Virtual Target Debugging with TRACE32
한컴MDS_Virtual Target Debugging with TRACE32
HANCOM MDS
 
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
“Practical DNN Quantization Techniques and Tools,” a Presentation from Facebook
Edge AI and Vision Alliance
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
John Ramey
 
Quantum Computing and Quantum Supremacy at Google
Quantum Computing and Quantum Supremacy at GoogleQuantum Computing and Quantum Supremacy at Google
Quantum Computing and Quantum Supremacy at Google
inside-BigData.com
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
Grigory Sapunov
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
 
Machine Learning & Predictive Maintenance
Machine Learning &  Predictive MaintenanceMachine Learning &  Predictive Maintenance
Machine Learning & Predictive Maintenance
Arnab Biswas
 
Accelerated Training of Transformer Models
Accelerated Training of Transformer ModelsAccelerated Training of Transformer Models
Accelerated Training of Transformer Models
Databricks
 
Flutter + tensor flow lite = awesome sauce
Flutter + tensor flow lite = awesome sauceFlutter + tensor flow lite = awesome sauce
Flutter + tensor flow lite = awesome sauce
Amit Sharma
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Edureka!
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and SwitchingCXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
CXL Memory Expansion, Pooling, Sharing, FAM Enablement, and Switching
Memory Fabric Forum
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
Qualcomm Research
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Purnima Pandit
 

Similar to “Practical Approaches to DNN Quantization,” a Presentation from Magic Leap (20)

“DNN Quantization: Theory to Practice,” a Presentation from AMD
“DNN Quantization: Theory to Practice,” a Presentation from AMD“DNN Quantization: Theory to Practice,” a Presentation from AMD
“DNN Quantization: Theory to Practice,” a Presentation from AMD
Edge AI and Vision Alliance
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
EDB
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Edge AI and Vision Alliance
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
Edge AI and Vision Alliance
 
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
Edge AI and Vision Alliance
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
Hossam Hassan
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
jemin lee
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
Upender Upr
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
QuantUniversity
 
Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01
hamza khan
 
FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
FQ-ViT: Post-Training Quantization for Fully Quantized Vision TransformerFQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Sungchul Kim
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
RECAP Project
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
Hitesh Mohapatra
 
Securing Kubernetes Workloads
Securing Kubernetes WorkloadsSecuring Kubernetes Workloads
Securing Kubernetes Workloads
Jim Bugwadia
 
“DNN Quantization: Theory to Practice,” a Presentation from AMD
“DNN Quantization: Theory to Practice,” a Presentation from AMD“DNN Quantization: Theory to Practice,” a Presentation from AMD
“DNN Quantization: Theory to Practice,” a Presentation from AMD
Edge AI and Vision Alliance
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
Edge AI and Vision Alliance
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
EDB
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Edge AI and Vision Alliance
 
Approximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithmsApproximation techniques used for general purpose algorithms
Approximation techniques used for general purpose algorithms
Sabidur Rahman
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
Subhashis Hazarika
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
CloudLightning
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
Edge AI and Vision Alliance
 
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
Edge AI and Vision Alliance
 
NoC simulators presentation
NoC simulators presentationNoC simulators presentation
NoC simulators presentation
Hossam Hassan
 
Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
jemin lee
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
DonghyunKang12
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
Upender Upr
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
QuantUniversity
 
Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01
hamza khan
 
FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
FQ-ViT: Post-Training Quantization for Fully Quantized Vision TransformerFQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Sungchul Kim
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
RECAP Project
 
Securing Kubernetes Workloads
Securing Kubernetes WorkloadsSecuring Kubernetes Workloads
Securing Kubernetes Workloads
Jim Bugwadia
 
Ad

More from Edge AI and Vision Alliance (20)

“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 
“Multi-object Tracking Systems,” a Presentation from Tryolabs
“Multi-object Tracking Systems,” a Presentation from Tryolabs“Multi-object Tracking Systems,” a Presentation from Tryolabs
“Multi-object Tracking Systems,” a Presentation from Tryolabs
Edge AI and Vision Alliance
 
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
Edge AI and Vision Alliance
 
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
Edge AI and Vision Alliance
 
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
Edge AI and Vision Alliance
 
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
Edge AI and Vision Alliance
 
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
Edge AI and Vision Alliance
 
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
Edge AI and Vision Alliance
 
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
Edge AI and Vision Alliance
 
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
Edge AI and Vision Alliance
 
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
Edge AI and Vision Alliance
 
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
Edge AI and Vision Alliance
 
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
Edge AI and Vision Alliance
 
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
Edge AI and Vision Alliance
 
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
Edge AI and Vision Alliance
 
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 
“Multi-object Tracking Systems,” a Presentation from Tryolabs
“Multi-object Tracking Systems,” a Presentation from Tryolabs“Multi-object Tracking Systems,” a Presentation from Tryolabs
“Multi-object Tracking Systems,” a Presentation from Tryolabs
Edge AI and Vision Alliance
 
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
“Improved Navigation Assistance for the Blind via Real-time Edge AI,” a Prese...
Edge AI and Vision Alliance
 
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
“Using Vision Systems, Generative Models and Reinforcement Learning for Sport...
Edge AI and Vision Alliance
 
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
“Introduction to Cameras for Embedded Applications,” a Presentation from Sens...
Edge AI and Vision Alliance
 
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
“Introduction to Modern Radar for Machine Perception,” a Presentation from Se...
Edge AI and Vision Alliance
 
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
“Diagnosing Problems and Implementing Solutions for Deep Neural Network Train...
Edge AI and Vision Alliance
 
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
“Seeing Through Machines: A Guide to Image Sensors for Edge AI Applications,”...
Edge AI and Vision Alliance
 
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
“Transformer Networks: How They Work and Why They Matter,” a Presentation fro...
Edge AI and Vision Alliance
 
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
“Removing Weather-related Image Degradation at the Edge,” a Presentation from...
Edge AI and Vision Alliance
 
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
“Seeing the Invisible: Unveiling Hidden Details through Advanced Image Acquis...
Edge AI and Vision Alliance
 
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
“Data-efficient and Generalizable: The Domain-specific Small Vision Model Rev...
Edge AI and Vision Alliance
 
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
“Omnilert Gun Detect: Harnessing Computer Vision to Tackle Gun Violence,” a P...
Edge AI and Vision Alliance
 
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
“Adventures in Moving a Computer Vision Solution from Cloud to Edge,” a Prese...
Edge AI and Vision Alliance
 
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
“Bridging Vision and Language: Designing, Training and Deploying Multimodal L...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...Canadian book publishing: Insights from the latest salary survey - Tech Forum...
Canadian book publishing: Insights from the latest salary survey - Tech Forum...
BookNet Canada
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptxWebinar - Top 5 Backup Mistakes MSPs and Businesses Make   .pptx
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptx
MSP360
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Viam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdfViam product demo_ Deploying and scaling AI with hardware.pdf
Viam product demo_ Deploying and scaling AI with hardware.pdf
camilalamoratta
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...
Raffi Khatchadourian
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
machines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdfmachines-for-woodworking-shops-en-compressed.pdf
machines-for-woodworking-shops-en-compressed.pdf
AmirStern2
 
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
GDG Cloud Southlake #42: Suresh Mathew: Autonomous Resource Optimization: How...
James Anderson
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 

“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap

  • 1. Practical Approaches to DNN Quantization Dwith Chenna Senior Embedded DSP Eng., Computer Vision Magic Leap Inc.
  • 2. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 3 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 3. • Quantization is a powerful tool to enable deep learning on edge devices • Resource constrained hardware with limited memory and low power requirement Why Quantization? 4 © 2023 Magic Leap
  • 4. • Model compression: Up to 4x smaller (float32 to int8) network size and memory bandwidth • Latency reduction: Up to 2x-3x times, int8 compute is significantly faster compared to float32 [1] • Trade-off: Potential effects on the model accuracy Why Quantization? 5 © 2023 Magic Leap
  • 5. • Convert full precision float-point numbers to int8 [2] q - quantized value, r - real value, s - scale, z - zero point • Quantized value to float-point representation • In case of float-point distribution, we obtain scale and zero point as: Quantization Scheme 6 © 2023 Magic Leap
  • 6. • Assumes symmetric distribution for simplicity, zero point = 0 • Symmetric per tensor • Calculate scale for the entire tensor • Symmetric per channel • Calculate scale for each channel of the tensor • Computationally efficient Quantization Scheme: Symmetric 7 © 2023 Magic Leap
  • 7. • Accounts for shifts in the distribution, better utilization of quantization range • Asymmetric per tensor • Scale and zero point for the entire tensor • Symmetric per channel • Scale and zero points for each channel of the tensor • Better handling of diverse distributions Quantization Scheme: Asymmetric 8 © 2023 Magic Leap
  • 8. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 9 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 9. • Post Training Quantization (PTQ) • Simple yet efficient • Uses already trained model and calibration dataset • Quantization Aware Training (QAT) • Emulates inference-time quantization • Resource intensive as it needs retraining Types of Quantization 10 © 2023 Magic Leap
  • 10. • Dynamic Quantization • Weights are quantized ahead of time • Activations are quantized during inference (dynamic) • Static Quantization • Weights and activations are quantized • Memory bandwidth and compute savings • Needs representative dataset Post Training Quantization 11 © 2023 Magic Leap
  • 11. • Best quantization scheme for deep neural networks? • Weights: Symmetric per channel • Static distribution makes it easy for quantization • Weight distributions tend to be symmetric [3] • Symmetric per channel handles diversity in weight distribution Post Training Quantization 12 © 2023 Magic Leap Empirical distribution in a pre-trained network
  • 12. • Activations: Asymmetric/Symmetric per tensor • Dynamic distribution per inference makes it difficult to find statistics • Approximation through representative/calibration dataset • Batch normalization enables better distributions for quantization Post Training Quantization 13 © 2023 Magic Leap
  • 13. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 14 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 14. • Tflite supports 8-bit integer PTQ [1] • Quantization scheme • Weights: Symmetric per channel • Activations: Asymmetric per tensor • Quantization analysis • Selective quantization with mixed precision (float32/16 + int8/int16) • Layerwise quantization error with custom metrics Quantization Tools: Tflite 15 © 2023 Magic Leap
  • 15. • Pytorch supports 8-bit integer PTQ [4] • Quantization scheme • Weights: (A)symmetric per tensor/channel • Activations: (A)symmetric per tensor/channel • Quantization analysis • Layerwise quantization error through custom metrics Quantization Tools: Pytorch 16 © 2023 Magic Leap
  • 16. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 17 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 17. • FLOPS is not everything! • Network Architecture Search (NAS) • Most NAS based models (e.g., efficientNet) try to minimize compute • Results in deeper and leaner network that works well with cache-based systems Network Architecture 18 © 2023 Magic Leap
  • 18. • Efficient architecture for quantization [5] Network Architecture 19 © 2023 Magic Leap
  • 19. • Quantization aware • Larger models have redundancy which enables robustness to quantization • Quantization scheme • Enable utilization of simpler and efficient quantization schemes Network Architecture 20 © 2023 Magic Leap
  • 20. • Optimization tool chain • Aggressive layer fusion for optimal memory bandwidth • Optimal quantization parameter selection • Hardware • Better suited for the hardware CPU/GPU/DSP/accelerator Network Architecture 21 © 2023 Magic Leap
  • 21. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 22 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 22. • Representative dataset to estimate activation distribution • Need to address diversity of the use case • Size: ~100-1000 images are statistically significant [6] Calibration Dataset 23 © 2023 Magic Leap
  • 23. • Minimize quantization error and eliminate outliers • Trade-offs: range vs quantization error • Mean/Standard deviation • Assuming normal distribution • Min/Max: mean +/- 3*STD Min/Max Tuning 24 © 2023 Magic Leap
  • 24. • Histogram • Ignore the last x% percent • Moving average (TensorFlow default) • Search max/min (Pytorch/TensorRT) • Find histogram to cover most entropy Min/Max Tuning 25 © 2023 Magic Leap
  • 25. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 26 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 26. • Evaluate best fit quantization schemes to the model [2] • ResNet50: Symmetric per tensor • MobileNet: Asymmetric per channel Quantization Evaluation 27 © 2023 Magic Leap
  • 27. • Effects of quantization scheme on model accuracy [2] • Classification accuracy of the quantized model Quantization Evaluation 28 © 2023 Magic Leap
  • 28. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 29 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 29. • What to do when quantization fails? • Individual layer support for quantization • Identifying few problematic layers will significantly improve performance • Common pitfalls • Handling input/output quantization • Layer fusion before quantization Quantization Analysis 30 © 2023 Magic Leap
  • 30. • Analyse individual layers sensitivity to quantization [1] • Selective quantization: mixed precision inference for testing Quantization Analysis 31 © 2023 Magic Leap There are many layers with wide ranges, and some layers have high rmse/scale values layer number (x-axis) vs activation range (y-axis) root mean square error (rmse) vs activation range
  • 31. • Non-Linear activations: precision requirement and quantization support • ReLU/ReLU6 preferred over Sigmoid/LeakyReLU • Weight/activation distribution: visualization or metrics for data distribution, i.e., range • Layer fusion Conv + BN + ReLU / Conv + BN / Conv + ReLU before quantization Quantization Analysis 32 © 2023 Magic Leap
  • 32. • Use larger bit width for more sensitive layers, i.e., fully connected, network head • Int16 activation support in tflite • Min/Max tuning: Outlier weights that cause all other weights to be less precise Quantization Analysis 33 © 2023 Magic Leap
  • 33. • Large difference in weight values for different output channels: more quantization error • Asymmetric/Symmetric per channel quantization • Weight equalization techniques to minimize the variation [7] Quantization Analysis 34 © 2023 Magic Leap
  • 34. • Weight equalization makes model quantization friendly Quantization Analysis 35 © 2023 Magic Leap
  • 35. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 36 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 36. • When everything else fails! • QAT is a fine-tuning process • Start with trained floating-point model: with reduced momentum and learning rate Quantization Aware Training 37 © 2023 Magic Leap
  • 37. • Inserting quantization nodes during training [8] • Simulate quantization using float-point operations • Tune quantization parameters during training Quantization Aware Training 38 © 2023 Magic Leap
  • 38. • QAT is able to achieve < 1% accuracy degradation w.r.t to floating-point inference [2] Quantization Aware Training 39 © 2023 Magic Leap
  • 39. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 40 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 40. • Model selection • NAS: Efficient architecture for quantization • Quantization tools • Support for quantization schemes and analysis tools • Calibration dataset • Representative dataset with ~100-1000 samples Best Practices 41 © 2023 Magic Leap
  • 41. • Quantization accuracy • Evaluate best-fit quantization scheme for the model • Quantization analysis • Identify potentially problematic layers • Quantization aware training • Fine tune model for quantization Best Practices 42 © 2023 Magic Leap
  • 42. 1. https://www.tensorflow.org/lite/performance/post_training_quantization 2. Quantizing deep convolutional networks for efficient inference: A whitepaper [link] 3. Fixed Point Quantization of Deep Convolutional Networks [link] 4. https://pytorch.org/docs/stable/quantization.html 5. https://deci.ai/resources/achieve-fp32-accuracy-int8-inference-speed/ 6. SelectQ: Calibration Data Selection for Post-Training Quantization[link] 7. AI Model Efficiency Toolkit (AIMET) [link] 8. Aspects and best practices of quantization aware training for custom network accelerators [link] References 43 © 2023 Magic Leap