PQAT

The document discusses the optimization of deep learning models for deployment on edge devices, focusing on techniques such as pruning and quantization to enhance efficiency and reduce resource consumption. It highlights the necessity of edge deployment for real-time processing, privacy, and reduced bandwidth usage, while outlining the edge deployment process and key optimization aspects. The document also details the benefits and methods of pruning and quantization, emphasizing their importance for model performance on resource-constrained devices.

Uploaded by

Sekhar Sankuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

PQAT

Uploaded by

Sekhar Sankuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Optimizing Deep Learning Models for

Edge Device Deployment

(Quantization and Pruning)
Table of Contents
↓ Introduction
↓ Need for Edge Deployment
↓ Edge Deployment process
↓ Technics
↓ Pruning
↓ Quantization
↓ Tools for Edge deployment
Introduction
➢Edge AI: Bringing deep learning models to edge devices
➢Challenges: Limited compute, memory, and power constraints
➢Importance of optimization for real-time, low-latency
applications
Necessity of Edge Deployment
Low Latency & Real-Time Processing
• Edge devices enable real-time decision-making by processing data locally, reducing dependency on cloud
servers.
• This is crucial for applications like autonomous vehicles, medical diagnostics, and industrial automation.
Privacy & Security
• By keeping data processing on the device, edge deployment minimizes the risk of data breaches and
enhances privacy compliance
• This is essential for sensitive applications like healthcare and finance.
Reduced Bandwidth & Power Consumption
• Transmitting large amounts of data to the cloud is costly and power-intensive.
• Edge AI optimizes resource usage by processing data locally, reducing bandwidth consumption and
making AI feasible for IoT and battery-powered devices.
Edge Deployment Process
Edge Deployment Pipeline
Key Optimization Aspects for Edge AI

•Model Compression (Pruning, Quantization, Knowledge Distillation)

•Efficient Architectures (MobileNets, EfficientNets, Transformers for Edge)

•Hardware Acceleration (TPUs, NPUs, FPGAs)

•Inference Optimization (TfLite, ONNX, TensorRT, EdgeTPU)

Training vs Inference
Example: XOR gate
Mathematical Calculations
Pruning
• Model pruning refers to the act of removing unimportant parameters from a deep
learning neural network model to reduce the model size and enable more efficient
model inference.
• Generally, only the weights of the parameters are pruned, leaving the biases untouched.
The pruning of biases tends to have much more significant downsides.

https://www.datature.io/blog/a-comprehensive-guide-to-neural-network-model-pruning
Need for Pruning
Pruning removes redundant weights in neural networks to enhance
efficiency. Key benefits include:
• Reduced Model Size: Eliminates insignificant connections, lowering
storage and memory requirements.
• Faster Inference: Sparse models require fewer computations, improving
execution speed.
• Lower Power Consumption: Reduces computational load, making
inference energy-efficient.
• Edge Deployment: Enables deployment on resource-limited devices by
optimizing model complexity.
• Pruning maintains model performance while significantly improving
efficiency for real-world applications.
Pruning
Pruning Process
Technics
There are two main approaches to pruning neural networks:
• Train-Time Pruning: Pruning is integrated into the training process, typically by
applying L1 or L2 regularization to encourage sparsity in the weights. This helps
the model learn a more compact representation while training.
• Post-Training Pruning: The model is fully trained first, and then pruning is
applied to remove less significant weights. This method does not influence the
training process and is commonly used to optimize pre-trained models for
deployment.
Quantization ✓ Quantization refers to the process of approximating a
model's parameters (weights and activations) using
lower-precision data types, such as 8-bit integers
(INT8), instead of the commonly used 32-bit floating-
point numbers (FP32).
✓ The primary goal is to improve the efficiency of deep
learning models by reducing the number of bits
required for computations.
Need for Quantization
Quantization is essential for deploying deep learning models on resource-
constrained devices. Key benefits include:
• Reduced Memory Footprint: Lower-precision formats (e.g., INT8 vs.
FP32) minimize storage needs.
• Faster Inference: Low-precision arithmetic speeds up computations,
especially on specialized hardware like TPUs.
• Lower Power Consumption: Reduces energy use, making it ideal for
battery-powered devices.
• Edge Deployment: Enables efficient on-device inference without reliance
on cloud resources.
• Quantization enhances efficiency, making deep learning models
lightweight and scalable for real-world applications.
Quantization Process
Verification after Quantization
Technics
There are two main approaches to pruning neural networks:
1. Post-Training Quantization (PTQ)
1. Applied after model training without modifying learned weights.
2. Converts weights and activations from FP32 to lower precision (e.g., INT8).
3. Types:
1. Dynamic Quantization (only weights are quantized, activations stay FP32).
2. Static Quantization (both weights and activations are quantized using calibration).
4. Advantage: Faster inference with minimal retraining.
2. Quantization-Aware Training (QAT)
1. Simulates quantization effects during training so the model learns to adapt.
2. Weights and activations remain in FP32 during training but use lower precision (e.g., INT8) in inference.
3. Advantage: Higher accuracy than PTQ, especially for complex models.
• These two methods are widely used in real-world applications, with PTQ preferred for efficiency and QAT for
maintaining accuracy.
Conclusion
TensorFlow Lite package
Edge deployment

Design and Build Modern Datacentres, A to Z practical guide
From Everand
Design and Build Modern Datacentres, A to Z practical guide
Engineer Said AL Hosni
3/5 (2)
ML System Optimization - Lecture 10 - Model Optimization Techniques
No ratings yet
ML System Optimization - Lecture 10 - Model Optimization Techniques
33 pages
Module 10 - Learners Guide
No ratings yet
Module 10 - Learners Guide
29 pages
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
No ratings yet
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
70 pages
A Survey of Model Compression and Acceleration For Deep Neural Networks
No ratings yet
A Survey of Model Compression and Acceleration For Deep Neural Networks
10 pages
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
Model Compression and Pruning Techniques
No ratings yet
Model Compression and Pruning Techniques
2 pages
2101.09671v3
No ratings yet
2101.09671v3
41 pages
Applsci 12 11184
No ratings yet
Applsci 12 11184
18 pages
DLEI_PPT_B-Batch_Unit-6
No ratings yet
DLEI_PPT_B-Batch_Unit-6
41 pages
MLSys 2020 What Is The State of Neural Network Pruning Paper
No ratings yet
MLSys 2020 What Is The State of Neural Network Pruning Paper
18 pages
1812.06426
No ratings yet
1812.06426
10 pages
M Thesis Report
No ratings yet
M Thesis Report
38 pages
Paper 3
No ratings yet
Paper 3
5 pages
Soft Filter Pruning For Accelerating Deep Convolutional Neural Networks
No ratings yet
Soft Filter Pruning For Accelerating Deep Convolutional Neural Networks
8 pages
Mastering Embedded C: The Ultimate Guide to Building Efficient Systems
From Everand
Mastering Embedded C: The Ultimate Guide to Building Efficient Systems
Robert Johnson
No ratings yet
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
2412.18091v1
No ratings yet
2412.18091v1
12 pages
Pruning v:s Quantization
No ratings yet
Pruning v:s Quantization
21 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
CIKM
No ratings yet
CIKM
173 pages
the kmean quatization
No ratings yet
the kmean quatization
14 pages
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
No ratings yet
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
10 pages
DL presentation
No ratings yet
DL presentation
20 pages
Solodskikh_Integral_Neural_Networks_CVPR_2023_paper
No ratings yet
Solodskikh_Integral_Neural_Networks_CVPR_2023_paper
10 pages
Energy-efficient deep learning inference on edge devices
No ratings yet
Energy-efficient deep learning inference on edge devices
55 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
IMP Deep Learning
No ratings yet
IMP Deep Learning
9 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
EAI FinalProject Team3
No ratings yet
EAI FinalProject Team3
24 pages
Lec03 Pruning I
No ratings yet
Lec03 Pruning I
74 pages
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
No ratings yet
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
11 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Safari - 07-Apr-2023 at 4:10 PM
No ratings yet
Safari - 07-Apr-2023 at 4:10 PM
1 page
Enabling BNN by Edge
No ratings yet
Enabling BNN by Edge
19 pages
A Survey of Quantization Methods For Efficient Neural Network Inference
No ratings yet
A Survey of Quantization Methods For Efficient Neural Network Inference
33 pages
1 - A Day in The Life of ChatGPT As A Researcher
No ratings yet
1 - A Day in The Life of ChatGPT As A Researcher
20 pages
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
No ratings yet
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
15 pages
1580 Rethinking the Value of Networ
No ratings yet
1580 Rethinking the Value of Networ
21 pages
LightGBM in Practice: Definitive Reference for Developers and Engineers
From Everand
LightGBM in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2021-Huan Wang-Emerging Paradigms of Neural Network Pruning
No ratings yet
2021-Huan Wang-Emerging Paradigms of Neural Network Pruning
8 pages
3an Empirical Study of Binary N
No ratings yet
3an Empirical Study of Binary N
11 pages
Model Compression For Deep Neural Networks - A Survery 1689444018366
No ratings yet
Model Compression For Deep Neural Networks - A Survery 1689444018366
22 pages
Programming the MSP430 Microcontroller: Definitive Reference for Developers and Engineers
From Everand
Programming the MSP430 Microcontroller: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Runtime Neural Pruning
No ratings yet
Runtime Neural Pruning
11 pages
High-Performance Hardware For Machine Learning - 0916
No ratings yet
High-Performance Hardware For Machine Learning - 0916
68 pages
Paper 8
No ratings yet
Paper 8
7 pages
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
konigstein2_v-8_ScrambledContent_chapter-9
No ratings yet
konigstein2_v-8_ScrambledContent_chapter-9
10 pages
21699-Article Text-25712-1-2-20220628
No ratings yet
21699-Article Text-25712-1-2-20220628
2 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
57 pages
2006.03669v2
No ratings yet
2006.03669v2
73 pages
Computerised Systems Architecture: An embedded systems approach
From Everand
Computerised Systems Architecture: An embedded systems approach
S Mathioudakis
No ratings yet
DAC'22_EBSP_Bit_Sparsity_DNN
No ratings yet
DAC'22_EBSP_Bit_Sparsity_DNN
6 pages
PEFT Parameter Efficient Fine Tuning
No ratings yet
PEFT Parameter Efficient Fine Tuning
9 pages
15 Ways to Lower LLM Costs
No ratings yet
15 Ways to Lower LLM Costs
17 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Finding Transformer Circuits With Edge Pruning Presentation
No ratings yet
Finding Transformer Circuits With Edge Pruning Presentation
8 pages
Mastering C: Advanced Techniques and Best Practices
From Everand
Mastering C: Advanced Techniques and Best Practices
Adam Jones
No ratings yet
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
No ratings yet
An_End-to-End_Workflow_to_Efficiently_Compress_and_Deploy_DNN_Classifiers_on_SoC_FPGA
4 pages
Integration
No ratings yet
Integration
14 pages
journal.pone.0315538
No ratings yet
journal.pone.0315538
16 pages
slides-m440-recurrent-neural-networks
No ratings yet
slides-m440-recurrent-neural-networks
32 pages
Python_key
No ratings yet
Python_key
2 pages
Syllabus
No ratings yet
Syllabus
20 pages
AB1202 Week 3
No ratings yet
AB1202 Week 3
5 pages
Pulse Amplitude Modulation
No ratings yet
Pulse Amplitude Modulation
6 pages
ASTM D-2513 - 04 Thermoplastic Gas Pressure Pipe, Tubing, and Fittings PDF
100% (2)
ASTM D-2513 - 04 Thermoplastic Gas Pressure Pipe, Tubing, and Fittings PDF
25 pages
Recent Developments in Piping Vibration Screening Limits
No ratings yet
Recent Developments in Piping Vibration Screening Limits
15 pages
Microsoft Word Shortcut Key
100% (1)
Microsoft Word Shortcut Key
11 pages
HGE Terms
No ratings yet
HGE Terms
150 pages
Prainsa Training Manual - Oracle Inventory
No ratings yet
Prainsa Training Manual - Oracle Inventory
36 pages
4.3 Hernández-ospina et al. 2024
No ratings yet
4.3 Hernández-ospina et al. 2024
13 pages
Note-Human Eye and Colourfull World
No ratings yet
Note-Human Eye and Colourfull World
6 pages
Buy ebook Beyond Significance Testing Statistics Reform in the Behavioral Sciences 2nd Edition Rex B. Kline cheap price
100% (15)
Buy ebook Beyond Significance Testing Statistics Reform in the Behavioral Sciences 2nd Edition Rex B. Kline cheap price
85 pages
Total Dissolved Solids, Turbidity and Water Clarity
No ratings yet
Total Dissolved Solids, Turbidity and Water Clarity
34 pages
RC4 Example
No ratings yet
RC4 Example
3 pages
6-XFMJ Series: 6-XFMJ-100 Front-Terminal Gel Battery
No ratings yet
6-XFMJ Series: 6-XFMJ-100 Front-Terminal Gel Battery
2 pages
LaTeX Tutorial Book
No ratings yet
LaTeX Tutorial Book
103 pages
Gems from the Mathematics Teacher
No ratings yet
Gems from the Mathematics Teacher
168 pages
Maintain Hand Tools
No ratings yet
Maintain Hand Tools
34 pages
Kuat Tekan Bambu Laminasi Dan Aplikasiny
No ratings yet
Kuat Tekan Bambu Laminasi Dan Aplikasiny
10 pages
Exhaust After-Treatment Technology
100% (2)
Exhaust After-Treatment Technology
51 pages
Bloom Filters: References
No ratings yet
Bloom Filters: References
22 pages
Akron Case Study
No ratings yet
Akron Case Study
6 pages
Python BCA5thSem QA 2025
No ratings yet
Python BCA5thSem QA 2025
3 pages
Guide4BankExams English Grammar
100% (3)
Guide4BankExams English Grammar
11 pages
0625_s24_qp_21
No ratings yet
0625_s24_qp_21
16 pages
Ejercicios Sistemas de Ecuaciones Lineales
No ratings yet
Ejercicios Sistemas de Ecuaciones Lineales
8 pages
12th Physics EM 1 Mark Questions English Medium PDF Download
No ratings yet
12th Physics EM 1 Mark Questions English Medium PDF Download
18 pages
11 2 Subtracting Decimals
No ratings yet
11 2 Subtracting Decimals
2 pages
Chapter - 15: Transposition Arrangement Arrangement and Stringing Chart
No ratings yet
Chapter - 15: Transposition Arrangement Arrangement and Stringing Chart
6 pages
Hgtd7N60B3S, Hgt1S7N60B3S, Hgtp7N60B3: 14A, 600V, Ufs Series N-Channel Igbts Features
No ratings yet
Hgtd7N60B3S, Hgt1S7N60B3S, Hgtp7N60B3: 14A, 600V, Ufs Series N-Channel Igbts Features
7 pages