0% found this document useful (0 votes)

59 views

Lec03 Pruning I

Uploaded by

peter.yeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Lec03 Pruning I

Uploaded by

peter.yeh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

EfficientML.

ai Lecture 03:
Pruning and Sparsity
Part I

Song Han
Associate Professor, MIT
Distinguished Scientist, NVIDIA
@SongHan_MIT

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai
ffi
ffi
Today’s AI is too BIG!
81
InceptionV3 Xception
ResNetXt-50
ImageNet Top-1 accuracy (%)

DenseNet-169 DPN-92
77 ResNetXt-101
DenseNet-121 DenseNet-264
MBNetV2
75
ResNet-101
Shu eNet InceptionV2 ResNet-50
73 2M 4M 8M 16M 32M 64M
IGCV3-D

71 #Parameters
MobileNetV1

69
0 1 2 3 4 5 6 7 8 9
MACs (Billion)

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey [Deng et al., IEEE 2020]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 2
ffl
ffi
ffi
Efficient Deep Learning Techniques are Essential
Bridges the Gap between the Supply and Demand of Computation

MT-NLG
1000 530B
GPT-3
Model Size (#Params in Billion) 175B Model compression
100 bridges the gap.
T-NLG
17B
A100
10 TPUv3
A100 80GB
V100
TPUv2 MegatronLM 40GB
32GB 32GB
16GB 8.3B
1
GPT-2
1.5B
BERT
0.1 0.34B Model Size
GPT
0.11B GPU Memory
Transformer
0.05B Assume data are FP16.
0.01
2017 2018 2020 2021 2022

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 3
ffi
ffi
Part 1 of This Course: Efficient Inference

Pruning

Quantization

Neural Architecture Search

Knowledge Distillation
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 4
ffi
ffi
MLPerf (the Olympic Game for AI Computing)
Pruning on Large Language Models
• The open division submission on Llama 2 70B: 2.5x speedup while maintaining 99% accuracy.
• Depth pruning: 80 layers -> 32 layers
• Width pruning: 28,762 intermediate dimensions -> 14,336 intermediate dimensions

Closed Division Open Division Speedup

O ine samples/sec 4488 11189 2.5x

Llama 2 70B performance metrics for both closed division and open division.
Measured on a single NVIDIA H200 GPU.

NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 5
ffl
ffi
ffi
Memory is Expensive
Data Movement → More Memory Reference → More Energy

Operation Energy [pJ] Relative Energy Cost

32 bit int ADD 0.1

32 bit oat ADD 0.9

32 bit Register File 1

200 ✕
32 bit int MULT 3.1

32 bit oat MULT 3.7

32 bit SRAM Cache 5

32 bit DRAM Memory 640

Rough Energy Cost For Various Operations in 45nm 0.9V
1 10 100 1000 10000

1 = 200
This image is in the public domain

Computing's Energy Problem (and What We Can Do About it) [Horowitz, M., IEEE ISSCC 2014]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 6
fl
fl
ffi
ffi
Memory is Expensive
Data Movement → More Memory Reference → More Energy

Operation Energy [pJ] Relative Energy Cost

32 bit int ADD 0.1

32 bit oat ADD 0.9

32 bit Register File 1

200 ✕
How should
32 bit int MULT we make 3.1
deep learning more e cient?
32 bit oat MULT 3.7

32 bit SRAM Cache 5

32 bit DRAM Memory 640

Rough Energy Cost For Various Operations in 45nm 0.9V
1 10 100 1000 10000

Battery images are in the public domain

Image 1, image 2, image 2, image 4

Computing's Energy Problem (and What We Can Do About it) [Horowitz, M., IEEE ISSCC 2014]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 7
fl
fl
ffi
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning? before pruning after pruning

• Determine the Pruning Granularity

pruning
• In what pattern should we prune the neural synapses
network?
• Determine the Pruning Criterion
pruning
• What synapses/neurons should we prune? neurons

• Determine the Pruning Ratio

• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 8
ffi
ffi
ffi
Neural Network Pruning
• In general, we could formulate the pruning as
follows:
x x
arg min L(x; WP)
WP
subject to
∥Wp∥0 < N

• L represents the objective function for neural

network training;
• x is input, W is original weights, WP is pruned
weights; arg min L(x; W) arg min L(x; WP)
W WP
• ∥Wp∥0 calculates the #nonzeros in WP, and N is s . t .∥WP∥0 ≤ N
the target #nonzeros.

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 9
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning?
• Determine the Pruning Granularity
• In what pattern should we prune the neural Pruning
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
• Determine the Pruning Ratio
• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 10
ffi
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning?
• Determine the Pruning Granularity
?
• In what pattern should we prune the neural Pruning ?
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
which synapses?
• Determine the Pruning Ratio which neurons?
• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 11
ffi
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning?
• Determine the Pruning Granularity
• In what pattern should we prune the neural Pruning
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
prune 30%?
• Determine the Pruning Ratio prune 50%?
• What should target sparsity be for each layer? prune 70%?

• Fine-tune/Train Pruned Neural Network

• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 12
ffi
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning? x
• How should we formulate pruning?
• Determine the Pruning Granularity
• In what pattern should we prune the neural Pruning
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
• Determine the Pruning Ratio
arg min L(x; WP)
• What should target sparsity be for each layer? WP

• Fine-tune/Train Pruned Neural Network s . t .∥WP∥0 ≤ N

• Determine the Pruning Granularity

pruning
• In what pattern should we prune the neural synapses
network?
• Determine the Pruning Criterion
pruning
• What synapses/neurons should we prune? neurons

• Determine the Pruning Ratio

• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 14
ffi
ffi
ffi
Pruning Happens in Human Brain
Number of Synapses 15000 synapses
[1]
per neuron

7000 synapses
per neuron [2]
2500 synapses
per neuron [1]
Time
Newborn 2-4 years old Adolescence Adult

Do We Have Brain to Spare? [Drachman DA, Neurology 2004] Data Source: 1, 2

Peter Huttenlocher (1931–2013) [Walsh, C. A., Nature 2013] Slide Inspiration: Alila Medical Media
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 15
ffi
ffi
Neural Network Pruning
Make neural network smaller by removing synapses and neurons
before pruning after pruning

pruning
synapses

pruning
neurons

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

0.5%
0.0%
-0.5%

Accuracy Loss
-1.0%
-1.5%
-2.0%

Train Connectivity -2.5%

-3.0%
-3.5%
-4.0%
-4.5%
40% 50% 60% 70% 80% 90% 100%
Pruning Ratio (Parameters Pruned Away)

Pruning

0.5%
0.0%
-0.5%

Accuracy Loss
-1.0%
-1.5%
-2.0%

Train Connectivity -2.5%

-3.0%
-3.5%
-4.0%
Prune Connections
-4.5%
40% 50% 60% 70% 80% 90% 100%
Pruning Ratio (Parameters Pruned Away)

Pruning Pruning+Finetuing

0.5%
0.0%
-0.5%

Accuracy Loss
-1.0%
-1.5%
-2.0%

Train Connectivity -2.5%

-3.0%
-3.5%
-4.0%
Prune Connections
-4.5%
40% 50% 60% 70% 80% 90% 100%
Pruning Ratio (Parameters Pruned Away)
Train Weights
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 19
ffi
ffi
ffi
Neural Network Pruning
Make neural network smaller by removing synapses and neurons

Pruning Pruning+Finetuing Iterative Pruning and Finetuing

0.5%
0.0%
-0.5%

Accuracy Loss
-1.0%
-1.5%
-2.0%

Train Connectivity -2.5%

-3.0%
-3.5%
-4.0%
Prune Connections
-4.5%
40% 50% 60% 70% 80% 90% 100%
Pruning Ratio (Parameters Pruned Away)
Train Weights
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 20
ffi
ffi
ffi
Neural Network Pruning
Make neural network smaller by removing synapses and neurons

#Parameters MACs
Neural Network
Before Pruning After Pruning Reduction Reduction

AlexNet 61 M 6.7 M 9✕ 3✕

VGG-16 138 M 10.3 M 12 ✕ 5✕

GoogleNet 7M 2.0 M 3.5 ✕ 5✕

ResNet50 26 M 7.47 M 3.4 ✕ 6.3 ✕

SqueezeNet 1M 0.38 M 3.2 ✕ 3.5 ✕

Baseline: a basketball Baseline: a brown dog is Baseline: a man is riding Baseline: a soccer
player in a white uniform running through a grassy a surfboard on a wave. player in red is running in
is playing with a ball . eld. the eld.

Pruned 90%: a Pruned 90%: a brown Pruned 90%: a man in a Pruned 95%: a man in a
basketball player in a dog is running through a wetsuit is riding a wave red shirt and black and
white uniform is playing grassy area. on a beach. white black shirt is
with a basketball. running through a eld.
E cient Methods and Hardware for Deep Learning [Han S., Stanford University]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 22
fi
ffi
fi
ffi
fi
ffi
Neural Network Pruning
Make neural network smaller by removing synapses and neurons
3200
#publications on pruning and sparse neural networks

2400
# Publications

1600
EIE

800 Optimal
Brain Damage
Deep
Compression
0
1989 1992 1995 1998 2001 2004 2007 2010 2013 2016 2019 2022
598 Le Cun, Denker and Solla

Learning both Weights and Connections for Efficient

Neural Networks

Song Han Jeff Pool

Optimal Brain Damage Stanford University NVIDIA
[email protected] [email protected]

John Tran William J. Dally

NVIDIA Stanford University
Yann Le Cun, John S. Denker and Sara A. Sol1a [email protected] NVIDIA
AT&T Bell Laboratories, Holmdel, N. J. 07733 [email protected]

ABSTRACT
Souce: https://github.com/mit-han-lab/pruning-sparsity-publications Abstract
We have used information-theoretic ideas to derive a class of prac-
Neural networks are both computationally intensive and memory intensive, making
MIT 6.5940: TinyML and E cient Deep Learning Computing
tical and nearly optimal schemes for adapting the size of a neural
network. By removing unimportant weights from a network, sev-
https://e cientml.ai
them difficult to deploy on embedded systems. Also, conventional networks fix 23
ffi
ffi
Pruning in the Industry
Hardware support for sparsity

EIE [Han et al., ISCA 2016]

ESE [Han et al., FPGA 2017]

2:4 sparsity in A100 GPU

SpArch [Zhang et al., HPCA 2020] 2X peak performance, 1.5X measured BERT speedup
SpAtten [Wang et al., HPCA 2021]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 24
ffi
ffi
Pruning in the Industry
Hardware support for sparsity

EIE [Han et al., ISCA 2016]

ESE [Han et al., FPGA 2017]

Reduce model complexity by 5x to 50x with minimal

SpArch [Zhang et al., HPCA 2020] accuracy impact
SpAtten [Wang et al., HPCA 2021]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 25
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning?
• Determine the Pruning Granularity
• In what pattern should we prune the neural Pruning
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
• Determine the Pruning Ratio
• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 26
ffi
ffi
ffi
Section 2: Pruning Granularity
Pruning can be performed at di erent granularities, from structured to non-structured.

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 27
ffi
ffi
ff
Pruning at Different Granularities
A simple example of 2D weight matrix

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 28
ffi
ffi
Pruning at Different Granularities
A simple example of 2D weight matrix

Preserved
Pruned

Fine-grained/Unstructured
• More exible pruning index choice
• Hard to accelerate (irregular)

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 29
fl
ffi
ffi
Pruning at Different Granularities
A simple example of 2D weight matrix

Preserved
Pruned

Fine-grained/Unstructured Coarse-grained/Structured
• More exible pruning index choice • Less exible pruning index choice (a subset
• Hard to accelerate (irregular) of the ne-grained case)
• Easy to accelerate (just a smaller matrix!)

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 30
fl
fl
ffi
fi
ffi
Pruning at Different Granularities
The case of convolutional layers
• The weights of convolutional layers have 4 dimensions [co, ci, kh, kw]:
• ci: input channels (or channels)
• co: output channels (or lters)
• kh: kernel size height
• kw: kernel size width

• The 4 dimensions give us more choices to select pruning granularities

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 31
ffi
ffi
fi
Pruning at Different Granularities
The case of convolutional layers
Preserved
• Some of the commonly used pruning granularities
Pruned

Notations

kw = 3
kh = 3
co = 3

ci = 2

kh = 3
co = 3
Preserved
• Some of the commonly used pruning granularities
Pruned
ci = 2

Irregular Regular

Fine-grained Pattern-based Vector-level Kernel-level Channel-level

Pruning Pruning Pruning Pruning Pruning
like Tetris :)

kh = 3
co = 3
Preserved
• Some of the commonly used pruning granularities
Pruned
ci = 2
Pros? Cons?

Irregular Regular

Fine-grained Pattern-based Vector-level Kernel-level Channel-level

Pruning Pruning Pruning Pruning Pruning

Exploring the granularity of sparsity in convolutional neural networks [Mao et al., CVPR-W]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 34
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Fine-grained Pruning (the case we show before)
• Flexible pruning indices

Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 35
ffi
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Fine-grained Pruning (the case we show before)
• Flexible pruning indices
• Usually larger compression ratio since we can exibly nd “redundant” weights (we will later
discuss how we nd them)

#Parameters
Neural Network
Before Pruning After Pruning Reduction

AlexNet 61 M 6.7 M 9✕

VGG-16 138 M 10.3 M 12 ✕

GoogleNet 7M 2.0 M 3.5 ✕

ResNet50 26 M 7.47 M 3.4 ✕

E cient Methods and Hardware for Deep Learning [Han S., Stanford University]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 36
ffi
ffi
fi
ffi
fl
fi
Pruning at Different Granularities
Let’s look into some cases
• Fine-grained Pruning (the case we show before)
• Flexible pruning indices
• Usually larger compression ratio since we can exibly nd “redundant” weights (we will later
discuss how we nd them)
• Can deliver speed up on some custom hardware (e.g., EIE) but not GPU (easily)

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 37
ffi
fi
ffi
fl
fi
Pruning at Different Granularities kw = 3
The case of convolutional layers

kh = 3
co = 3
Preserved
• Some of the commonly used pruning granularities
Pruned
ci = 2
Pros? Cons?

Irregular Regular

Fine-grained Pattern-based Vector-level Kernel-level Channel-level

Pruning Pruning Pruning Pruning Pruning

Exploring the granularity of sparsity in convolutional neural networks [Mao et al., CVPR-W]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 38
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Pattern-based Pruning: N:M sparsity
• N:M sparsity means that in each contiguous M elements, N of them is pruned

Dense Matrix

Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 39
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Pattern-based Pruning: N:M sparsity
• N:M sparsity means that in each contiguous M elements, N of them is pruned
• A classic case is 2:4 sparsity (50% sparsity)

Dense Matrix 2:4 Sparse Matrix

Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 40
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Pattern-based Pruning: N:M sparsity
• N:M sparsity means that in each contiguous M elements, N of them is pruned
• A classic case is 2:4 sparsity (50% sparsity)
• It is supported by NVIDIA’s Ampere GPU Architecture, which delivers up to 2x speed up

non-zero 2-bit
values indices

Dense Matrix 2:4 Sparse Matrix Compressed Matrix

Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 41
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Pattern-based Pruning: N:M sparsity
• N:M sparsity means that in each contiguous M elements, N of them is pruned
• A classic case is 2:4 sparsity (50% sparsity)
• It is supported by NVIDIA’s Ampere GPU Architecture, which delivers ~2x speed up
• Usually maintains accuracy (tested on varieties of tasks)

kh = 3
co = 3
Preserved
• Some of the commonly used pruning granularities
Pruned
ci = 2
Pros? Cons?

Irregular Regular

Fine-grained Pattern-based Vector-level Kernel-level Channel-level

Pruning Pruning Pruning Pruning Pruning

Exploring the granularity of sparsity in convolutional neural networks [Mao et al., CVPR-W]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 43
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Channel Pruning
• Pro: Direct speed up due to reduced channel numbers (leading to an NN with smaller
#channels)
• Con: smaller compression ratio

#channels
Layer 0 Sparsity=0.5
Layer 1 Sparsity=0.3
Channel
Layer 2 Sparsity=0.7
Prune
Layer 3 Sparsity=0.2
Layer 4 Sparsity=0.3
… …

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 44
ffi
ffi
Pruning at Different Granularities
Let’s look into some cases
• Channel Pruning
• Pro: Direct speed up due to reduced channel numbers (leading to an NN with smaller
#channels)
• Con: smaller compression ratio
We will later discuss how to nd sparsity ratios
Sparsity=0.3 Sparsity=0.5
Sparsity=0.3 Sparsity=0.3
Sparsity=0.3 < Sparsity=0.7
Sparsity=0.3 Sparsity=0.2
Sparsity=0.3 Sparsity=0.3
… …
Uniform Shrink Channel Prune

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 45
ffi
ffi
fi
Pruning at Different Granularities
Let’s look into some cases
• Channel Pruning
• Pro: Direct speed up due to reduced channel numbers (leading to an NN with smaller
#channels)
• Con: smaller compression ratio

ImageNet Accuracy (%)

Pruning (AMC)

< Uniform Scaling

… …
Uniform Shrink Channel Prune
Latency (ms)

AMC: Automl for Model Compression and Acceleration on Mobile Devices [He et al., ECCV 2018]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 46
ffi
ffi
Neural Network Pruning
• Introduction to Pruning
• What is pruning?
• How should we formulate pruning?
• Determine the Pruning Granularity
?
• In what pattern should we prune the neural Pruning ?
network?
• Determine the Pruning Criterion
• What synapses/neurons should we prune?
which synapses?
• Determine the Pruning Ratio which neurons?
• What should target sparsity be for each layer?
• Fine-tune/Train Pruned Neural Network
• How should we improve performance of pruned
models?
Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 47
ffi
ffi
ffi
Section 3: Pruning Criterion
What synapses and neurons should we prune?

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 48
ffi
ffi
Selection of Synapses to Prune
• When removing parameters from a neural network model,
• the less important the parameters being removed are,
• the better the performance of pruned neural network is.

w0 x0 Example
(∑ ) f( ⋅ ) = ReLU( ⋅ ), W = [10, − 8, 0.1]
y=f wi xi + b
i
w1x1
∑
wi xi + b f ➡ y = ReLU(10x0 − 8x1 + 0.1x2)
i
w2 x2
• If one weight will be removed, which one?

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 49
ffi
ffi
Magnitude-based Pruning
A heuristic pruning criterion
• Magnitude-based pruning considers weights with larger absolute values are more important
than other weights.
• For element-wise pruning,
Importance = | W |

• Example

3 -2 L1-norm |3| |-2| 3 2 3 0

1 -5 Element-wise |1| |-5| 1 5 0 -5

Weight Importance Pruned Weight

Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 50
ffi
ffi
ffi
Magnitude-based Pruning
A heuristic pruning criterion
• Magnitude-based pruning considers weights with larger absolute values are more important
than other weights.
• For row-wise pruning, the L1-norm magnitude can be de ned as,
(S)
∑
Importance = | wi | , where W is the structural set S of parameters W
i∈S

• Example

3 -2 L1-norm |3|+|-2| 5 0 0

1 -5 Row-wise |1|+|-5| 6 1 -5

Weight Importance Pruned Weight

Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 51
ffi
ffi
ffi
fi
Magnitude-based Pruning
A heuristic pruning criterion
• Magnitude-based pruning considers weights with larger absolute values are more important
than other weights.
• For row-wise pruning, the L2-norm magnitude can be de ned as,
2 (S)
∑
Importance = | wi | , where W is the structural set S of parameters W
i∈S

• Example
13
3 -2 L2-norm = 2
|3| + | − 2| 2 √13 0 0
26
1 -5 Row-wise 2 2 √26 1 -5
= |1| + | − 5|

Weight Importance Pruned Weight

Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 52
ffi
ffi
ffi
fi
Magnitude-based Pruning
A heuristic pruning criterion
• Magnitude-based pruning considers weights with larger absolute values are more important
than other weights.
• Magnitude is also known as Lp-norm de ned as,
1
p

(∑ )
(S) p (S)
∥W ∥p = | wi | , where W is a structural set of parameters
i∈S

• Example
13
3 -2 L2-norm = 2
|3| + | − 2| 2 √13 0 0
26
1 -5 Row-wise 2 2 √26 1 -5
= |1| + | − 5|

Weight Importance Pruned Weight

Learning Structured Sparsity in Deep Neural Networks [Wen et al., NeurIPS 2016]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 53
ffi
ffi
fi
Scaling-based Pruning
Pruning criterion for lter pruning
• A scaling factor is associated with each lter (i.e., output channel) in convolutional layers
• The scaling factor is multiplied to the output of that channel
• The scaling factors are trainable parameters

Channel
Weight Activation
Scaling Factor

Filter 0 1.17 Channel 0

Filter 1 0.10 Channel 1

Filter 2 0.29 Channel 2

Filter 3 0.82 Channel 3

⋮ ⋮ ⋮
Filter N-1 0.56 Channel N-1

Learning E cient Convolutional Networks through Network Slimming [Liu et al., ICCV 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 54
ffi
ffi
ffi
fi
fi
Scaling-based Pruning
Pruning criterion for lter pruning
• A scaling factor is associated with each lter (i.e., output channel) in convolutional layers
• The scaling factor is multiplied to the output of that channel
• The scaling factors are trainable parameters
• The lters/output channels with small scaling factor magnitude will be pruned
Channel Channel
Weight Activation Weight Activation
Scaling Factor Scaling Factor

Filter 0 1.17 Channel 0 Filter 0 1.17 Channel 0

Filter 1 0.10 Channel 1

Filter 3 0.82 Channel 3

Filter 2 0.29 Channel 2 ⋮ ⋮ ⋮

Filter N-1 0.56 Channel N-1
Filter 3 0.82 Channel 3

⋮ ⋮ ⋮
Filter N-1 0.56 Channel N-1

Learning E cient Convolutional Networks through Network Slimming [Liu et al., ICCV 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 55
fi
ffi
ffi
ffi
fi
fi
Scaling-based Pruning
Pruning criterion for lter pruning
• A scaling factor is associated with each lter (i.e., output channel) in convolutional layers
• The scaling factors can be reused from batch normalization layer
zi − μℬ
zo = γ +β
σℬ
2 +ϵ

Channel Channel
Weight Activation Weight Activation
Scaling Factor Scaling Factor

Filter 0 1.17 Channel 0 Filter 0 1.17 Channel 0

Filter 1 0.10 Channel 1

Filter 3 0.82 Channel 3

Filter 2 0.29 Channel 2 ⋮ ⋮ ⋮

Filter N-1 0.56 Channel N-1
Filter 3 0.82 Channel 3

⋮ ⋮ ⋮
Filter N-1 0.56 Channel N-1

Learning E cient Convolutional Networks through Network Slimming [Liu et al., ICCV 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 56
ffi
ffi
ffi
fi
fi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• The induced error can be approximated by a Taylor series.
1 2 1 3
∑ ∑ ∑
δL = L(x; W) − L(x; WP = W − δW) = giδwi + hiiδwi + hijδwiδwj + O(∥δW∥ )
i
2 i
2 i≠j
where
2
∂L ∂L
gi = , hi,j =
∂wi ∂wi∂wj
• Optimal Brain Damage assumes that

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 57
ffi
ffi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• The induced error can be approximated by a Taylor series.
1 2 1 3
∑ ∑ ∑
δL = L(x; W) − L(x; WP = W − δW) = giδwi + hiiδwi + hijδwiδwj + O(∥δW∥ )
i
2 i
2 i≠j
where
2
∂L ∂L
gi = , hi,j =
∂wi ∂wi∂wj
• Optimal Brain Damage assumes that
• The objective function L is nearly quadratic: the last term is neglected

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 58
ffi
ffi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• The induced error can be approximated by a Taylor series.
1 2 1 3
∑ ∑ ∑
δL = L(x; W) − L(x; WP = W − δW) = giδwi + hiiδwi + hijδwiδwj + O(∥δW∥ )
i
2 i
2 i≠j
where
2
∂L ∂L
gi = , hi,j =
∂wi ∂wi∂wj
• Optimal Brain Damage assumes that
• The objective function L is nearly quadratic: the last term is neglected
• The neural network training has converged: rst-order terms are neglected

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 59
ffi
ffi
fi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• The induced error can be approximated by a Taylor series.
1 2 1 3
∑ ∑ ∑
δL = L(x; W) − L(x; WP = W − δW) = giδwi + hiiδwi + hijδwiδwj + O(∥δW∥ )
i
2 i
2 i≠j
where
2
∂L ∂L
gi = , hi,j =
∂wi ∂wi∂wj
• Optimal Brain Damage assumes that
• The objective function L is nearly quadratic: the last term is neglected
• The neural network training has converged: rst-order terms are neglected
• The error caused by deleting each parameter is independent: cross terms are neglected

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 60
ffi
ffi
fi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• The induced error can be approximated by a Taylor series.
1 2 1 3
∑ ∑ ∑
δL = L(x; W) − L(x; WP = W − δW) = giδwi + hiiδwi + hijδwiδwj + O(∥δW∥ )
i
2 i
2 i≠j
where
2
∂L ∂L
gi = , hi,j =
∂wi ∂wi∂wj
• Optimal Brain Damage assumes that
• The objective function L is nearly quadratic: the last term is neglected
• The neural network training has converged: rst-order terms are neglected
• The error caused by deleting each parameter is independent: cross terms are neglected
1 2
δLi = L(x; W) − L(x; WP | wi = 0) ≈ hiiwi
2

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 61
ffi
ffi
fi
Second-Order-based Pruning
Minimize the error on loss function introduced by pruning synapses
• Optimal Brain Damage assumes that
• The objective function L is nearly quadratic
• The neural network training has converged
• The error caused by deleting each parameter is independent
2
1 2 ∂ L
δLi = L(x; W) − L(x; WP | wi = 0) ≈ hiiwi , where hii =
2 ∂wi∂wj
• The synapses with smaller induced error | δLi | will be removed; that is to say,
1 2
importancewi = | δLi | = hiiwi
2
* hii is non-negative

Hessian Matrix H is di cult to compute.

Optimal Brain Damage [LeCun et al., NeurIPS 1989]

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 62
ffi
ffi
ffi
Selection of Neurons to Prune
• When removing neurons from a neural network model,
• the less useful the neurons being removed are,
• the better the performance of pruned neural network is.
Neuron pruning is coarse-grained weight pruning
Weight Matrix

Neuron Pruning
in Linear Layer

Channel Pruning
in Convolution Layer

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 63
ffi
ffi
Percentage-of-Zero-Based Pruning
• ReLU activation will generate zeros in the output activation.

Width = 4 Width = 4
0 0.1 0.5 1 0.1 0.5 0 0 0 0 0.8 0 0.5 0 0.2 0.1 0.1 0.5 0 0 0 0.8 0.1 0
Height = 4

Height = 4
1.2 0.6 0.3 0.2 0.2 0.3 0 1 0.7 0 0.6 0.1 0 0.2 1.2 0 0 0.8 0 1 0.2 0 0 0.3
Output
Activations 0 0.5 0 0.3 0.1 0 0 0.5 1.2 1 0 0.2 1.2 0 0.2 0.3 0.1 0 0.1 1.0 0 0.4 0 0.5
0.2 0 0 0.8 0.1 0.6 0.7 0.1 0.5 0 0.3 0.5 0.2 0.4 0 0 0.2 0 1.0 0 0.2 0 0.3 0

Channel = 3 Batch = 2 Channel = 3

Network Trimming: A Data-Driven Neuron Pruning Approach towards E cient Deep Architectures [Hu et al., ArXiv 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 64
ffi
ffi
ffi
Percentage-of-Zero-Based Pruning
• ReLU activation will generate zeros in the output activation.
• Similar to magnitude of weights, the Average Percentage of Zero activations (APoZ) can be
exploited to measure the importance of the neurons.

Width = 4 Width = 4
0 0.1 0.5 1 0.1 0.5 0 0 0 0 0.8 0 0.5 0 0.2 0.1 0.1 0.5 0 0 0 0.8 0.1 0
Height = 4

Channel = 3 Batch = 2 Channel = 3

5+6 11 5+7 12 6+8 14

Average Percentage of Zeros (APoZ) = = = = = =
2 ⋅ 4 ⋅ 4 32 2 ⋅ 4 ⋅ 4 32 2 ⋅ 4 ⋅ 4 32

Channel 0 Channel 1 Channel 2

Network Trimming: A Data-Driven Neuron Pruning Approach towards E cient Deep Architectures [Hu et al., ArXiv 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 65
ffi
ffi
ffi
Percentage-of-Zero-Based Pruning
• ReLU activation will generate zeros in the output activation.
• Similar to magnitude of weights, the Average Percentage of Zero activations (APoZ) can be
exploited to measure the importance of the neurons.
• The smaller APoZ is, the more importance the neuron has.
Width = 4 Width = 4
0 0.1 0.5 1 0.1 0.5 0 0 0 0 0.8 0 0.5 0 0.2 0.1 0.1 0.5 0 0 0 0.8 0.1 0
Height = 4

Channel = 3 Batch = 2 Channel = 3

5+6 11 5+7 12 6+8 14

Average Percentage of Zeros (APoZ) = = = = = =
2 ⋅ 4 ⋅ 4 32 2 ⋅ 4 ⋅ 4 32 2 ⋅ 4 ⋅ 4 32

Channel 0 Channel 1 Channel 2

Network Trimming: A Data-Driven Neuron Pruning Approach towards E cient Deep Architectures [Hu et al., ArXiv 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 66
ffi
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Instead of considering the pruning error of the
objective function L(x; W), regression-based ci co co
pruning minimizes the reconstruction error of the
corresponding layer’s outputs.
b
ci
=b
T
X W Z

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 67
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Instead of considering the pruning error of the
objective function L(x; W), regression-based ci co co
pruning minimizes the reconstruction error of the
corresponding layer’s outputs.
b
ci
=b
T
X W Z

ci co co
b
ci
=b
XP WTP Ẑ

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 68
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Instead of considering the pruning error of the
objective function L(x; W), regression-based ci co co
pruning minimizes the reconstruction error of the
corresponding layer’s outputs.
b
ci
=b
T
X W Z

Minimize the error between Z and Ẑ

ci co co
b
ci
=b
XP WTP Ẑ

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 69
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Let ci−1
T T
∑
Z = XW = XcWc ci co co
c=0
b X0 X1 X2 X3
ci
WT0
WT1
=b
WT2
WT3
T
X W Z

Minimize the error between Z and Ẑ

ci co co
b
ci
=b
XP WTP Ẑ

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 70
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Let ci−1
T T
∑
Z = XW = XcWc ci co co

• The problem can be formulate as

c=0
b X0 X1 X2 X3
ci
WT0
WT1
=b
WT2
ci−1
WT3
̂ 2 T 2
∑
arg min ∥Z − Z∥F = ∥Z − βcXcWc ∥F T
W, β X W Z
c=0
subject to
Minimize the error between Z and Ẑ
∥β∥0 ≤ Nc
• β is coe cient vector of length ci for channel ci co co
selection. βc = 0 means channel c is pruned. b
ci
=b
• Nc is the number of nonzero channels.

XP WTP Ẑ

Channel Pruning for Accelerating Very Deep Neural Networks [He et al., ICCV 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 71
ffi
ffi
ffi
Regression-based Pruning
Minimize reconstruction error of the corresponding layer’s outputs
• Let ci−1
T T
∑
Z = XW = XcWc ci co co

• The problem can be formulate as

• Solve the problem by: XP WTP Ẑ

• Fix W, solve β for channel selection
• Fix β, solve W to minimize reconstruction error
Channel Pruning for Accelerating Very Deep Neural Networks [He et al., ICCV 2017]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 72
ffi
ffi
ffi
Summary of Today’s Lecture Pruning Demo
In this lecture, we introduced:
• What is pruning
• Granularities of pruning before pruning after pruning

• Criteria to select weights to prune

• We will cover in the next lecture: pruning
synapses
• How to nd pruning ratio for each layer
• How to train/ ne-tune the pruned layer
pruning
• Automated ways to nd pruning ratios neurons

• System support for di erent granularities

MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 73
ffi
fi
fi
ffi
fi
ff
References
1. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey [Deng et al., IEEE
2020]
2. Computing's Energy Problem (and What We Can Do About it) [Horowitz, M., IEEE ISSCC 2014]
3. Optimal Brain Damage [LeCun et al., NeurIPS 1989]
4. Learning Both Weights and Connections for E cient Neural Network [Han et al., NeurIPS 2015]
5. E cient Methods and Hardware for Deep Learning [Han S., Stanford University]
6. Peter Huttenlocher (1931–2013) [Walsh, C. A., Nature 2013]
7. Exploring the granularity of sparsity in convolutional neural networks [Mao et al., CVPR-W]
8. Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT
9. AMC: Automl for Model Compression and Acceleration on Mobile Devices [He et al., ECCV 2018]
10. Learning Structured Sparsity in Deep Neural Networks [Wen et al., NeurIPS 2016]
11. Learning E cient Convolutional Networks through Network Slimming [Liu et al., ICCV 2017]
12. Pruning Convolutional Filters with First Order Taylor Series Ranking [Wang M.]
13. Importance Estimation for Neural Network Pruning [Molchanov et al., CVPR 2019]
14. Network Trimming: A Data-Driven Neuron Pruning Approach towards E cient Deep Architectures [Hu et al., ArXiv
2017]
15. Pruning Convolutional Neural Networks for Resource E cient Inference [Molchanov et al., ICLR 2017]
16. Channel Pruning for Accelerating Very Deep Neural Networks [He et al., ICCV 2017]
17. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [Luo et al., ICCV 2017]
18. SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot [Elias Frantar, Dan Alistarh, ArXiv 2023]
MIT 6.5940: TinyML and E cient Deep Learning Computing https://e cientml.ai 74
ffi
ffi
ffi
ffi
ffi
ffi
ffi

Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
No ratings yet
Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
11 pages
Lec04 Pruning II
No ratings yet
Lec04 Pruning II
119 pages
ML System Optimization Lecture 11 Pruning Again
No ratings yet
ML System Optimization Lecture 11 Pruning Again
123 pages
MLSys 2020 What Is The State of Neural Network Pruning Paper
No ratings yet
MLSys 2020 What Is The State of Neural Network Pruning Paper
18 pages
1506 02626 PDF
No ratings yet
1506 02626 PDF
9 pages
Towards_Efficient_Neuromorphic_Hardware_Unsupervis
No ratings yet
Towards_Efficient_Neuromorphic_Hardware_Unsupervis
15 pages
21699-Article Text-25712-1-2-20220628
No ratings yet
21699-Article Text-25712-1-2-20220628
2 pages
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
No ratings yet
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
70 pages
Pruning Coupled With Learning, Ensembles of Minimal Neural Networks, and Future of XAI
No ratings yet
Pruning Coupled With Learning, Ensembles of Minimal Neural Networks, and Future of XAI
23 pages
Pruning Introduction
No ratings yet
Pruning Introduction
13 pages
Snip: S - N P C S: Ingle Shot Etwork Runing Based On Onnection Ensitivity
No ratings yet
Snip: S - N P C S: Ingle Shot Etwork Runing Based On Onnection Ensitivity
15 pages
P C N N R E I: Runing Onvolutional Eural Etworks FOR Esource Fficient Nference
No ratings yet
P C N N R E I: Runing Onvolutional Eural Etworks FOR Esource Fficient Nference
17 pages
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
No ratings yet
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
15 pages
Pruning Algorithms of Neural Networks - A Comparat
No ratings yet
Pruning Algorithms of Neural Networks - A Comparat
11 pages
2021-Huan Wang-Emerging Paradigms of Neural Network Pruning
No ratings yet
2021-Huan Wang-Emerging Paradigms of Neural Network Pruning
8 pages
Network Compression and Speedup: Shuochao Yao, Yiwen Xu, Daniel Calzada
No ratings yet
Network Compression and Speedup: Shuochao Yao, Yiwen Xu, Daniel Calzada
70 pages
Safari - 07-Apr-2023 at 4:10 PM
No ratings yet
Safari - 07-Apr-2023 at 4:10 PM
1 page
1580 Rethinking the Value of Networ
No ratings yet
1580 Rethinking the Value of Networ
21 pages
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
No ratings yet
Learning To Prune Filters in Convolutional Neural Networks: Qianguih, Uneumann @usc - Edu Suya - You.civ@mail - Mil
10 pages
Runtime Neural Pruning
No ratings yet
Runtime Neural Pruning
11 pages
ML System Optimization - Lecture 10 - Model Optimization Techniques
No ratings yet
ML System Optimization - Lecture 10 - Model Optimization Techniques
33 pages
Zehao Huang Data-Driven Sparse Structure ECCV 2018 Paper
No ratings yet
Zehao Huang Data-Driven Sparse Structure ECCV 2018 Paper
17 pages
Neural Network Pruning and Pruning Parameters: G. Thimm and E. Fiesler
No ratings yet
Neural Network Pruning and Pruning Parameters: G. Thimm and E. Fiesler
2 pages
With Morphnet, Google Helps You Build Faster and Smaller Neural Networks
No ratings yet
With Morphnet, Google Helps You Build Faster and Smaller Neural Networks
6 pages
4990_DPaI_Differentiable_Pruni
No ratings yet
4990_DPaI_Differentiable_Pruni
22 pages
Lecture19 Efficient Transformer
No ratings yet
Lecture19 Efficient Transformer
64 pages
4a TensorCores
No ratings yet
4a TensorCores
18 pages
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
No ratings yet
To Prune, or Not To Prune: Exploring The Efficacy of Pruning For Model Compression
11 pages
Pattern Classification Using Simplified Neural Networks With Pruning Algorithm
No ratings yet
Pattern Classification Using Simplified Neural Networks With Pruning Algorithm
7 pages
Ilhan Resource-Efficient Transformer Pruning for Finetuning of Large Models CVPR 2024 Paper
No ratings yet
Ilhan Resource-Efficient Transformer Pruning for Finetuning of Large Models CVPR 2024 Paper
10 pages
1 - A Day in The Life of ChatGPT As A Researcher
No ratings yet
1 - A Day in The Life of ChatGPT As A Researcher
20 pages
4_deep_model_pruning
No ratings yet
4_deep_model_pruning
7 pages
Applsci 12 11184
No ratings yet
Applsci 12 11184
18 pages
PQAT
No ratings yet
PQAT
25 pages
Artificial Neural Network Thesis Topics
100% (2)
Artificial Neural Network Thesis Topics
4 pages
Paper
No ratings yet
Paper
15 pages
lec02
No ratings yet
lec02
91 pages
wanda_ICLR2024
No ratings yet
wanda_ICLR2024
23 pages
Scalpel: Customizing DNN Pruning To The Underlying Hardware Parallelism
No ratings yet
Scalpel: Customizing DNN Pruning To The Underlying Hardware Parallelism
13 pages
Llm Pruning Nvidia Copy
No ratings yet
Llm Pruning Nvidia Copy
9 pages
AS E P A L L M: Imple and Ffective Runing Pproach For Arge Anguage Odels
No ratings yet
AS E P A L L M: Imple and Ffective Runing Pproach For Arge Anguage Odels
22 pages
the kmean quatization
No ratings yet
the kmean quatization
14 pages
Compact Language Models via Pruning and Knowledge Distillation
No ratings yet
Compact Language Models via Pruning and Knowledge Distillation
27 pages
A Survey On The Vulnerability of Neural Network Pruning - A Question On Their Susceptibility To Membership Inference Attacks
No ratings yet
A Survey On The Vulnerability of Neural Network Pruning - A Question On Their Susceptibility To Membership Inference Attacks
10 pages
Compact Language Models Via Pruning Andknowledge Distillation
No ratings yet
Compact Language Models Via Pruning Andknowledge Distillation
21 pages
2104.08500v4
No ratings yet
2104.08500v4
4 pages
Group 16 - Green - AI - Poster
No ratings yet
Group 16 - Green - AI - Poster
1 page
ANN Module-III
No ratings yet
ANN Module-III
16 pages
NIPS 2016 Dynamic Network Surgery For Efficient Dnns Paper
No ratings yet
NIPS 2016 Dynamic Network Surgery For Efficient Dnns Paper
9 pages
poda+neural
No ratings yet
poda+neural
25 pages
2101.09671v3
No ratings yet
2101.09671v3
41 pages
OLMP Lab6
No ratings yet
OLMP Lab6
27 pages
Soft Filter Pruning For Accelerating Deep Convolutional Neural Networks
No ratings yet
Soft Filter Pruning For Accelerating Deep Convolutional Neural Networks
8 pages
20222-Article Text-24235-1-2-20220628
No ratings yet
20222-Article Text-24235-1-2-20220628
9 pages
Model Compression and Pruning Techniques
No ratings yet
Model Compression and Pruning Techniques
2 pages
A Survey of Quantization Methods For Efficient Neural Network Inference
No ratings yet
A Survey of Quantization Methods For Efficient Neural Network Inference
33 pages
Lec 01
No ratings yet
Lec 01
84 pages
Dynamic and Progressive Filter Pruning For Compressing Convolutional Neural Networks From Scratch
No ratings yet
Dynamic and Progressive Filter Pruning For Compressing Convolutional Neural Networks From Scratch
9 pages
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
Automotive Computer Network Repair
From Everand
Automotive Computer Network Repair
Mandy Concepcion
4.5/5 (7)
Multipair Ethernet over DMT ver.2: ver, #2
From Everand
Multipair Ethernet over DMT ver.2: ver, #2
Ashlan Chidester
No ratings yet
A Multi-Neural Network Acceleration Architecture
No ratings yet
A Multi-Neural Network Acceleration Architecture
14 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
DGX A100 System Architecture Whitepaper
No ratings yet
DGX A100 System Architecture Whitepaper
23 pages
Fair and Comprehensive Benchmarking of Machine Learning Processing Chips
No ratings yet
Fair and Comprehensive Benchmarking of Machine Learning Processing Chips
10 pages
BigDL - A Distributed Deep Learning Framework For Big Data
No ratings yet
BigDL - A Distributed Deep Learning Framework For Big Data
11 pages
Server Brochure ASUS 2023
No ratings yet
Server Brochure ASUS 2023
13 pages
[Ebooks PDF] download Silicon Photonics for High-Performance Computing and Beyond 1st Edition Mahdi Nikdast full chapters
100% (3)
[Ebooks PDF] download Silicon Photonics for High-Performance Computing and Beyond 1st Edition Mahdi Nikdast full chapters
37 pages
Intel AI Everywhere
No ratings yet
Intel AI Everywhere
29 pages
NVDA F3Q23 Investor Presentation FINAL
No ratings yet
NVDA F3Q23 Investor Presentation FINAL
62 pages
Lec03 Pruning I
No ratings yet
Lec03 Pruning I
74 pages
Xfusion Server v7
No ratings yet
Xfusion Server v7
46 pages
How To Make Your Own Deep Learning Accelerator Chip - by Manu Suryavansh - Towards Data Science
No ratings yet
How To Make Your Own Deep Learning Accelerator Chip - by Manu Suryavansh - Towards Data Science
18 pages