KD-Lib - A PyTorch Library For Knowledge Distillation, Pruning and Quantization

KD-Lib is an open-source PyTorch library that contains implementations of state-of-the-art algorithms for knowledge distillation, pruning, and quantization. The library aims to make complex neural network compression techniques more accessible and help transition research ideas into implementations. It supports over a dozen knowledge distillation algorithms, pruning based on lottery ticket hypothesis, and various quantization methods. Compared to other available frameworks, KD-Lib provides the most comprehensive set of algorithms across these three important compression techniques.

Uploaded by

daniel.syahputra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

KD-Lib - A PyTorch Library For Knowledge Distillation, Pruning and Quantization

Uploaded by

daniel.syahputra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization

Het Shah,1 Avishree Khare,2 * Neelay Shah,3∗ Khizir Siddiqui 4∗

{f201700931, f201701122, f201804003, f201804394}@goa.bits-pilani.ac.in
arXiv:2011.14691v1 [cs.LG] 30 Nov 2020

Abstract of its knowledge capacity. Knowledge distillation aims

to transfers knowledge from a large model to a smaller
In recent years, the growing size of neural networks has
led to a vast amount of research concerning compression
model without loss of validity. Several advancements have
techniques to mitigate the drawbacks of such large sizes. been witnessed in the development of richer knowledge
Most of these research works can be categorized into distillation algorithms, attempting to reduce the difference
three broad families : Knowledge Distillation, Pruning, in test accuracies of the teacher and the student. These
and Quantization. While there has been steady research algorithms are model-agnostic and hence can be used for a
in this domain, adoption and commercial usage of the wide variety of network architectures.
proposed techniques has not quite progressed at the rate.
We present KD-Lib, an open-source PyTorch based library, While knowledge distillation attempts to train an equally-
which contains state-of-the-art modular implementations
of algorithms from the three families on top of multiple
competent smaller network, network pruning (LeCun et al.
abstraction layers. KD-Lib is model and algorithm-agnostic, 1990) attempts to reduce the size of the existing network
with extended support for hyperparameter tuning using by removing unimportant weights. Different pruning tech-
Optuna and Tensorboard for logging and monitoring. niques differ in the choice of weights to eliminate and the
The library can be found at - methods used to do the same. Pruning can help in reduc-
https://github.com/SforAiDl/KD Lib ing the size of the network up to 90% with minimal loss in
performance. Some approaches have also been empirically
shown to result in faster training of the pruned network along
Introduction with a higher test accuracy (Frankle and Carbin 2018).
Deep neural networks (DNNs) have gained widespread pop-
ularity in recent years, finding use in several domains in- Quantization is another way to compress neural networks
cluding computer vision, natural language processing, hu- by reducing the number of bits used to store the weights. As
man computer interaction and more. These networks have the weights of a network are usually stored as 32-bit floating
achieved remarkable results on several tasks, often even sur- values (FP32), reducing the precision to 8-bit integer values
passing human-level performance. (INT8) will reduce the size of the network by 4 times. Sev-
The number of parameters of such DNNs often increase eral approaches have been developed to quantize networks
multi-fold with an increase in their representation capacity, with minimal loss in performance.
limiting the deployment capabilities and hence, the commer-
cial feasibility of these networks. This limitation warrants
the need for efficient compression techniques that can shrink These compression techniques have become extremely
the networks in size while ensuring that the drop in perfor- popular in recent years and are actively being researched.
mance is minimal. In this paper, we restrict our focus to three New algorithms proposed in research papers can be diffi-
widely-used compression techniques: Knowledge Distilla- cult to understand and implement, especially for potential
tion, Network Pruning and Quantization. users in a non-academic setting, thereby limiting their com-
Knowledge Distillation (Hinton, Vinyals, and Dean mercial usage. To the best of our knowledge, there does not
2015) is a compression paradigm that leverages the capa- exist an umbrella framework containing implementations of
bility of large neural networks (called teacher networks) state-of-the-art algorithms in Knowledge Distillation, Prun-
to transfer knowledge to smaller networks (called student ing and Quantization. In this paper, we present KD-Lib, a
networks). While large models (such as very deep neural comprehensive PyTorch based library for model compres-
networks or ensembles of many models) have higher sion. KD-Lib aims to bridge the gap between research and
knowledge capacity than small models, this capacity might widespread use of model compression techniques. We envi-
not be fully utilized. It can be computationally just as sion that such a framework would be helpful to researchers
expensive to evaluate a model even if it utilizes little as well, providing them a tool to build upon existing algo-
rithms and helping them in going from idea to implementa-
* Equal contribution tion faster.
Library Knowledge Distillation Pruning Quantization
KD-Lib (Ours) Present Present Present
Distiller(Zmora et al. 2019) Present (only 1 algorithm) Present Present
AIMET3 - Present Present
AquVitae1 Present - -
Distiller1 Present - -

Table 1: Comparision of various libraries with KD-Lib

Related work • Knowledge Distillation : The algorithms have been di-

We compare KD-Lib with several openly available frame- vided into two major task-types: Vision and Text. The Vi-
works and libraries. In our comparison, we do not include sion module currently supports 13 algorithms while the
libraries that support less than two algorithms. Text module supports distillation from BERT to LSTM-
based networks(Tang et al. 2019).
Distiller (Zmora et al. 2019) is the most extensive frame- • Pruning : The library currently supports pruning based
work we found, but it primarily focuses on quanti- on the Lottery ticket Hypothesis (Frankle and Carbin
zation and pruning with only one knowledge distilla- 2018).
tion algorithm (Hinton, Vinyals, and Dean 2015). AquVi- • Quantization : Static Quantization, Dynamic Quantiza-
tae1 contains 4 distillation methods but no quantization and Quantization Aware Training (QAT) (Jacob et al.
tion and pruning algorithms. Similarly Distiller2 has 11 2018) are currently supported by KD-Lib.
knowledge distillation techniques but lacks pruning and
quantization methods. AIMET3 focuses mainly on quan- Code Structure
tization and some other relatively less popular model The structure of the library has been designed for efficient
compression techniques such as tensor decomposition. use with the following major principles kept in mind:
In our survey, we found no library containing algorithms
pertaining to all 3 of the popular compression paradigms • The core function of an algorithm can be executed in
- knowledge distillation, pruning and quantization. Table 1 one line of code. Hence, the classes contain a dedicated
shows concise comparison with different frameworks. method for distillation/pruning/quantization.
• Each module allows extension to newer features and easy
Features and Algorithms modifications. Hence, fluid components of algorithms
KD-Lib houses several algorithms proposed in recent years (loss functions in distillation, for example) can be easily
for model compression. The following features have driven customized.
the design choices for the library: • Necessary statistics are available wherever needed.
Hence, methods dedicated to these are also present
• The main aim of KD-Lib is to make model compression
(get pruning statics, for example).
algorithms accessible to a wide range of users, and hence
the work is fully open-source.
Distiller
• The library should act as a catalyst for further research
in these fields. It should also be extendable to newer al- train student
gorithms and other model compression fields. Hence, it is train teacher
designed to be modular, allowing flexible modifications to evaluate
essential components that can lead to novel algorithms or calculate kd loss
better extensions to existing algorithms.
Figure 1: Structure of a Distiller.
• The interface should be easy to use. Hence, the core func-
tionalities (distillation/pruning/quantization) are accessi-
Knowledge Distillation algorithms can be accessed as
ble in a few lines of code.
Distiller objects (Figure 1), with at least the mentioned
• As tuning the hyperparameters is essential for optimum methods. The train student method distills knowledge from
performance, KD-Lib provides support for hyperparame- a teacher network to a student network, where the teacher
ter tuning via Optuna. Monitoring and logging support is network could optionally be trained using the train teacher
also provided through Tensorboard. method. The evaluate method can be invoked to test the
A brief description of the implemented algorithms is as fol- performance of the student network. The calculate kd loss
lows: method can overridden to provide a custom loss function for
distillation. This can also be leveraged by researchers to test
1
https://github.com/aquvitae/aquvitae novel Knowledge Distillation loss functions.
2
https://github.com/karanchahal/distiller Pruning algorithms have been implemented as Pruner
3
https://github.com/quic/aimet objects (Figure 2). Each Pruner object can access the
Pruner Pruning Epoch % Model Pruned Accuracy
prune 1 0.0 0.9878
get pruning statistics 2 0.10 0.9891
3 0.19 0.9890
Figure 2: Structure of a Pruner.
Table 3: Pruning percentage and accuracy of ResNet18
model on MNIST using Lottery Ticket Pruning
prune method for pruning the network. Additionally, the (Frankle and Carbin 2018). Each pruning epoch con-
get pruning statistics method can be used to obtain infor- sists of 5 training epochs. ’Model pruned’ is the percentage
mation about the weights of the network after pruning (per- of model pruned and ’Accuracy’ is the corresponding
centage of network pruned, for example). accuracy at the end of the epoch.

Algorithm % Size Change BA NA

Quantizer
Static -0.75 0.72 0.70
quantize
QAT -0.75 0.72 0.71
get performance statistics
Dynamic -0.19 0.70 0.70
get model sizes
Table 4: Comparison of various quantization algorithms.
Figure 3: Structure of a Quantizer. ’BA’ (Base Accuracy) is the accuracy of the model before
quantization, and ’NA’ (New Accuracy) is the accuracy of
Quantization algorithms can be accessed via Quantizer the model after quantization. ’% Size change’ refers to the
objects (Figure 3). The quantize method can be used for change in size after quantization. In Static Quantization and
quantization (with differing implementations for different QAT, ResNet18 is tested on the CIFAR10 dataset. For Dy-
algorithms). Additionally, the get model sizes method can namic Quantization, LSTM is tested on IMDB dataset.
be used to compare sizes of the model before and after quan-
tization and the get performance statistics method can be
used to compare test-times and error metrics for the two net- Conclusion and Future Work
works. In this paper, we present KD-Lib, an easy-to-use PyTorch-
The documentation for the library4 has the description of all based library for Knowledge Distillation, Pruning and Quan-
classes and selected tutorials with example code snippets. tization. KD-Lib is designed to facilitate the adoption of cur-
rent model compression techniques and act as a catalyst for
Benchmarks further research in this direction. We plan on actively main-
taining the library and also expanding it to include more al-
We summarize benchmark results on some of the algorithms gorithms and desirable features (distributed training, for ex-
implemented in KD-Lib in Tables 2, 3 and 4. ample) in the future. We further plan on extending this li-
brary to other domains relevant to the research community
Algorithm Accuracy including but not limited to explainability and interpretabil-
None 0.57 ity in knowledge distillation.
DML (Zhang et al. 2018) 0.62
Self Training (Yun et al. 2020) 0.61 References
Messy Collab (Arani, Sarfraz, and Zonooz 2019) 0.60 Arani, E.; Sarfraz, F.; and Zonooz, B. 2019. Improving
Noisy Teacher (Sau and Balasubramanian 2016) 0.59 Generalization and Robustness with Noisy Collaboration in
TAKD (Mirzadeh et al. 2019) 0.59 Knowledge Distillation.
RCO (Jin et al. 2019) 0.58 Frankle, J.; and Carbin, M. 2018. The Lottery Ticket Hypoth-
Probability Shift (Wen, Lai, and Qian 2019) 0.58 esis: Finding Sparse, Trainable Neural Networks.
Hinton, G.; Vinyals, O.; and Dean, J. 2015. Distilling the
Table 2: The accuracies of networks trained by some of var- Knowledge in a Neural Network.
ious knowledge distillation algorithms KD-Lib packages on
the CIFAR10 dataset. All models were trained with the same Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard,
A.; Adam, H.; and Kalenichenko, D. 2018. Quantization and
hyperparameter set to ensure a fair comparison. We consider Training of Neural Networks for Efficient Integer-Arithmetic-
ResNet34 as the teacher network (with an accuracy of 0.63) Only Inference. In 2018 IEEE/CVF Conference on Computer
and report accuracies for the student network (ResNet18). Vision and Pattern Recognition.
None refers to a ResNet18 model trained from scratch with-
Jin, X.; Peng, B.; Wu, Y.; Liu, Y.; Liu, J.; Liang, D.; Yan,
out any model compression algorithm. The compression ra-
J.; and Hu, X. 2019. Knowledge Distillation via Route Con-
tio for all of the knowledge distillation algorithms is 50.7% strained Optimization.
LeCun, Y.; Denker, J. S.; ; and Solla, S. A. 1990. Optimal
brain damage. In Advances in Neural Information Processing
4
https://kd-lib.readthedocs.io/ Systems .
Mirzadeh, S.-I.; Farajtabar, M.; Li, A.; Levine, N.; Mat-
sukawa, A.; and Ghasemzadeh, H. 2019. Improved Knowl-
edge Distillation via Teacher Assistant.
Sau, B. B.; and Balasubramanian, V. N. 2016. Deep Model
Compression: Distilling Knowledge from Noisy Teachers.
Tang, R.; Lu, Y.; Liu, L.; Mou, L.; Vechtomova, O.; and Lin,
J. 2019. Distilling Task-Specific Knowledge from BERT into
Simple Neural Networks. CoRR abs/1903.12136.
Wen, T.; Lai, S.; and Qian, X. 2019. Preparing Lessons: Im-
prove Knowledge Distillation with Better Supervision.
Yun, S.; Park, J.; Lee, K.; and Shin, J. 2020. Regularizing
Class-Wise Predictions via Self-Knowledge Distillation. In
2020 IEEE/CVF Conference on Computer Vision and Pattern
Recognition.
Zhang, Y.; Xiang, T.; Hospedales, T. M.; and Lu, H. 2018.
Deep Mutual Learning. In 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition.
Zmora, N.; Jacob, G.; Zlotnik, L.; Elharar, B.; and
Novik, G. 2019. Neural Network Distiller: A Python
Package For DNN Compression Research URL
https://arxiv.org/abs/1910.12232.

Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
1z0-134 v2019-02-05 by Andrew 47q
No ratings yet
1z0-134 v2019-02-05 by Andrew 47q
39 pages
Programming A Kuka Robot With A Simatic s7 1500 A Robot Manufacturer Specific Development
No ratings yet
Programming A Kuka Robot With A Simatic s7 1500 A Robot Manufacturer Specific Development
68 pages
Fault Tree Tutorial
100% (2)
Fault Tree Tutorial
335 pages
A Survey of Quantization Methods For Efficient Neural Network Inference
No ratings yet
A Survey of Quantization Methods For Efficient Neural Network Inference
33 pages
Knowledge Distillation: A Survey
No ratings yet
Knowledge Distillation: A Survey
29 pages
Paper Survey - Training With Quantization Noise For Extreme Model Compression
No ratings yet
Paper Survey - Training With Quantization Noise For Extreme Model Compression
25 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
CIKM
No ratings yet
CIKM
173 pages
Model Compression For Deep Neural Networks - A Survery 1689444018366
No ratings yet
Model Compression For Deep Neural Networks - A Survery 1689444018366
22 pages
Applsci 12 11184
No ratings yet
Applsci 12 11184
18 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Compression Survey Hal
No ratings yet
Compression Survey Hal
26 pages
MLSys 2020 What Is The State of Neural Network Pruning Paper
No ratings yet
MLSys 2020 What Is The State of Neural Network Pruning Paper
18 pages
ModelCompressionTechniquesinDeepLearning
No ratings yet
ModelCompressionTechniquesinDeepLearning
23 pages
Model Compression Is The Big ML Flavour of 2021
No ratings yet
Model Compression Is The Big ML Flavour of 2021
4 pages
the kmean quatization
No ratings yet
the kmean quatization
14 pages
Thanks To XYZ Agency For Funding
No ratings yet
Thanks To XYZ Agency For Funding
5 pages
A Gift From Knowledge Distillation Fast Optimization, Network Minimization and Transfer Learning
No ratings yet
A Gift From Knowledge Distillation Fast Optimization, Network Minimization and Transfer Learning
9 pages
Beyer_Knowledge_Distillation_A_Good_Teacher_Is_Patient_and_Consistent_CVPR_2022_paper
No ratings yet
Beyer_Knowledge_Distillation_A_Good_Teacher_Is_Patient_and_Consistent_CVPR_2022_paper
10 pages
LLM Knowledge Distillation
No ratings yet
LLM Knowledge Distillation
17 pages
A Survey of Model Compression and Acceleration For Deep Neural Networks
No ratings yet
A Survey of Model Compression and Acceleration For Deep Neural Networks
10 pages
(P4) CLIP-Q Deep Network
No ratings yet
(P4) CLIP-Q Deep Network
10 pages
Unit 6.4 Compressing Neural Networks
No ratings yet
Unit 6.4 Compressing Neural Networks
45 pages
1 - A Day in The Life of ChatGPT As A Researcher
No ratings yet
1 - A Day in The Life of ChatGPT As A Researcher
20 pages
Two-Step Knowledge Distillation For Tiny Speech Enhancement
No ratings yet
Two-Step Knowledge Distillation For Tiny Speech Enhancement
5 pages
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Pruning Machine Learning
No ratings yet
Pruning Machine Learning
93 pages
Lecture19 Efficient Transformer
No ratings yet
Lecture19 Efficient Transformer
64 pages
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Greedy-Layer Pruning: Speeding Up Transformer Models For Natural Language Processing
No ratings yet
Greedy-Layer Pruning: Speeding Up Transformer Models For Natural Language Processing
10 pages
Compressing Neural Networks Using The Variational Information Bottleneck
No ratings yet
Compressing Neural Networks Using The Variational Information Bottleneck
27 pages
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
No ratings yet
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
10 pages
Mastering Kubernetes
From Everand
Mastering Kubernetes
Manish Soni
No ratings yet
And The Bit Goes Down
No ratings yet
And The Bit Goes Down
11 pages
Model Compression and Pruning Techniques
No ratings yet
Model Compression and Pruning Techniques
2 pages
Lecture # 15-1 Knowledge Distillation
No ratings yet
Lecture # 15-1 Knowledge Distillation
51 pages
Edge Computing 101: Novice To Pro: Expert Techniques And Practical Applications
From Everand
Edge Computing 101: Novice To Pro: Expert Techniques And Practical Applications
Rob Botwright
No ratings yet
System Design Basics
From Everand
System Design Basics
Kai Turing
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Group 16 - Green - AI - Poster
No ratings yet
Group 16 - Green - AI - Poster
1 page
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
Quantitative Strategies of Different Loss Functions Aggregation For Knowledge Distillation
No ratings yet
Quantitative Strategies of Different Loss Functions Aggregation For Knowledge Distillation
10 pages
17056-Article Text-20550-1-2-20210518
No ratings yet
17056-Article Text-20550-1-2-20210518
8 pages
2020 Emnlp-Main 37
No ratings yet
2020 Emnlp-Main 37
13 pages
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
PQAT
No ratings yet
PQAT
25 pages
2402.10631v1
No ratings yet
2402.10631v1
14 pages
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Knowledge Distillation and Student-Teacher Learning For Visual Intelligence: A Review and New Outlooks
No ratings yet
Knowledge Distillation and Student-Teacher Learning For Visual Intelligence: A Review and New Outlooks
40 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
No ratings yet
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
11 pages
Experiement 1,2,4 and 5
No ratings yet
Experiement 1,2,4 and 5
12 pages
Private 5G: A Systems Approach
From Everand
Private 5G: A Systems Approach
Larry L Peterson
No ratings yet
Mainframe Containerization Mastery: Mainframes
From Everand
Mainframe Containerization Mastery: Mainframes
Ricardo Nuqui
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Large Scale Machine Learning with Python
From Everand
Large Scale Machine Learning with Python
Bastiaan Sjardin
2/5 (1)
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
UNIT 4 Ann
No ratings yet
UNIT 4 Ann
8 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Efficientbert: Progressively Searching Multilayer Perceptron Via Warm-Up Knowledge Distillation
No ratings yet
Efficientbert: Progressively Searching Multilayer Perceptron Via Warm-Up Knowledge Distillation
14 pages
TSM Commands
100% (1)
TSM Commands
11 pages
Introduction To ODI Agents and Creating A ODI Standalone Agent
No ratings yet
Introduction To ODI Agents and Creating A ODI Standalone Agent
6 pages
Manual Cobol 74 PDF
No ratings yet
Manual Cobol 74 PDF
822 pages
Posiflex OPOS Driver Installation V13xx
No ratings yet
Posiflex OPOS Driver Installation V13xx
11 pages
Simple Lightweight Operation System
No ratings yet
Simple Lightweight Operation System
63 pages
Fmod
No ratings yet
Fmod
51 pages
Vba To VB Net XLL Add in With Excel Dna
100% (2)
Vba To VB Net XLL Add in With Excel Dna
51 pages
OpenRTM-aist-0.2.0 DevelopersGuide Eng
No ratings yet
OpenRTM-aist-0.2.0 DevelopersGuide Eng
116 pages
DevOps-IIB BuildAndDeploy Automation v1.2
No ratings yet
DevOps-IIB BuildAndDeploy Automation v1.2
44 pages
Nielit, Jammu: Industrial Training - On - Python
No ratings yet
Nielit, Jammu: Industrial Training - On - Python
24 pages
SimInTech eng
No ratings yet
SimInTech eng
32 pages
MySQL Architecture
No ratings yet
MySQL Architecture
6 pages
CooCox - User Guide
No ratings yet
CooCox - User Guide
150 pages
Employee Payment Management System: Submitted To
No ratings yet
Employee Payment Management System: Submitted To
24 pages
WORD MCQ With ANS OF AWP
No ratings yet
WORD MCQ With ANS OF AWP
30 pages
Introduction To Linux
No ratings yet
Introduction To Linux
27 pages
ESP8266 NodeMCU With BME280 Gauges Chart
No ratings yet
ESP8266 NodeMCU With BME280 Gauges Chart
7 pages
Installation Guide Meusburger Nx-Tool: Inhalt
No ratings yet
Installation Guide Meusburger Nx-Tool: Inhalt
4 pages
01 Modicon m218 Logic Controller Programming Guide
100% (1)
01 Modicon m218 Logic Controller Programming Guide
188 pages
Image Services: Single Document Storage Tool (SDS - Tool) Maintenance, Installation and Configuration Guide
No ratings yet
Image Services: Single Document Storage Tool (SDS - Tool) Maintenance, Installation and Configuration Guide
66 pages
Plant Simulation 15.2 Installation ENU
No ratings yet
Plant Simulation 15.2 Installation ENU
48 pages
TMS320F2837xD Microcontroller Workshop - Driverlib Supplement
No ratings yet
TMS320F2837xD Microcontroller Workshop - Driverlib Supplement
48 pages
R Bloomberg Manual 0-4-144
No ratings yet
R Bloomberg Manual 0-4-144
17 pages
J - C D C: AVA Entric Istributed Omputing
No ratings yet
J - C D C: AVA Entric Istributed Omputing
10 pages
MapleSim LabVIEW VeriStand Connector Getting Started Guide
No ratings yet
MapleSim LabVIEW VeriStand Connector Getting Started Guide
37 pages
CX Freeze
No ratings yet
CX Freeze
41 pages
Costing in The Process Industries
No ratings yet
Costing in The Process Industries
26 pages

KD-Lib - A PyTorch Library For Knowledge Distillation, Pruning and Quantization

Uploaded by

KD-Lib - A PyTorch Library For Knowledge Distillation, Pruning and Quantization

Uploaded by

KD-Lib: A PyTorch library for Knowledge Distillation, Pruning and Quantization

Het Shah,1 Avishree Khare,2 * Neelay Shah,3∗ Khizir Siddiqui 4∗

Abstract of its knowledge capacity. Knowledge distillation aims

Table 1: Comparision of various libraries with KD-Lib

Related work • Knowledge Distillation : The algorithms have been di-

Algorithm % Size Change BA NA

You might also like