0% found this document useful (0 votes)
21 views

Lec1 Introduction ECE618

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Lec1 Introduction ECE618

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

ECE618

Hardware Accelerators for Machine Learning


(Spring 2022)

Lecture 1: Course Information &


Machine Learning and FPGA Accelerator Recap
Weiwen Jiang, Ph.D.
Electrical and Computer Engineering
George Mason University
[email protected]
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 2 | George Mason University
Course Information
Instructor Dr. Weiwen Jiang
E-Mail [email protected]
Phone (703)993-5083
Lecture Time Monday 19:20 - 22:00
Location Room 1002, Music/Theater Building
Office Hour Monday 16:30 - 17:30
Office Room 3247, Nguyen Engineering Building
Zoom http://go.gmu.edu/zoom4weiwen
Backup Course Zoom https://go.gmu.edu/ece618 (Need Permission First)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 3 | George Mason University
About Me.
• Background
• Researcher at University of Pittsburgh (2017-2019)
• Postdoc at University of Notre Dame (2019-2021)
• George Mason University (2021 - present)
• Research Interests
• HW/SW Co-Design
Dr. Weiwen Jiang • Quantum Machine Learning
• Contacts:
[email protected]
• Nguyen Engineering Building, Room3247
• (703)993-5083
• https://jqub.ece.gmu.edu/

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 4 | George Mason University
Teaching Assistant

Yi Sheng (Ph.D. Candidate)


[email protected]
https://jqub.ece.gmu.edu/yi/
Office Hours: TBD

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 5 | George Mason University
Course Description
Covers the hardware design principles to deploy different
machine learning algorithms. The emphasis is on understanding
the fundamentals of machine learning and hardware
architectures and determine plausible methods to bridge them.

Topics include precision scaling, in-memory computing,


hyperdimensional computing, architectural modifications, GPUs
and vector architectures, quantum computing as well as recent
hardware programming tools such as Xilinx AI Vitis, Xilinx
HLS, and IBM Qiskit.

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 6 | George Mason University
Recommend Prerequisite
• ECE 554: Machine Learning for Embedded Systems

• Good C programming
• Especially required for FPGA-related project

• Familiar with Python and PyTorch

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 7 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 8 | George Mason University
Course Resources
• Blackboard:
• Assignments will be posted and submitted here!
• Online discussion, shared documents, announcements.
• Do NOT upload codes in discussion.
• Course Website:
• https://jqub.ece.gmu.edu/2022/01/01/HA4ML/
• Course information (TA time, location, zoom, etc.)
• Slides, readings, and documents will be posted here!

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 9 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 10 | George Mason University
Grading Policy
● Midterm Exam 10%
● Final Exam 20%
● Research Paper Presentation 20%
● Assignments and Labs 20%
● Project 30%

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 11 | George Mason University
You Have Been Warned.
Zero Tolerance!
• No matter vaccinated or not, face mask is required
in class

• Request to a Zoom access for a few classes if needed


ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 12 | George Mason University
You Have Been Warned.
Zero Tolerance!
• Lecture content and materials should NOT go online
without explicit permission

• No plagiarism!
The most common sense of way interpreting no plagiarism:
You need to DO your work.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 13 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 14 | George Mason University
Tools for lab
Google Colab

Xilinx High-Level Synthesis

IBM Qiskit

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 15 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 16 | George Mason University
What Software to Be Accelerated? --- MLP/CNN
Supervised Learning
Example: Classification
Training

Given: Labeled data as training dataset


(𝑥𝑖 , 𝑦𝑖 ): 𝑥𝑖 training data, 𝑦𝑖 : label

𝑥𝑖 = 𝑦𝑖 = 3

Output: A learned function 𝒇 from X to Y


𝒇: 𝑥 ↦ 𝑦

Inference/Execution
Given: Unseen data test dataset
A learned function 𝒇

Do: 𝒇( )=3

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 17 | George Mason University
What Software to Be Accelerated? --- MLP/CNN

• Local receptive fields Cat?


• Shared weights
• Pooling (subsampling) Dog?
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 18 | George Mason University
What Software to Be Accelerated? --- RNN
Supervised Learning
Example: Classification
Training

Given: Labeled data as training dataset


(𝑥𝑖 , 𝑦𝑖 ): 𝑥𝑖 training data, 𝑦𝑖 : label

𝑥𝑖 = 𝑦𝑖 =“can I”

Output: A learned function 𝒇 from X to Y


𝒇: 𝑥 ↦ 𝑦
SEP-28k Dataset
Inference/Execution
Given: Unseen data test dataset
A learned function 𝒇

Do: 𝒇 = “brown fox”

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 19 | George Mason University
What Software to Be Accelerated? --- RNN

Image Sentiment Translation Video frame


captioning classification (words → classification
(image → (words → words) (frames →
words) sentiment) classes)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU


What Hardware Will Be Covered in This Course?
The von Neumann structure, also known as the Princeton
structure, is a memory structure that merges program
instruction memory and data memory together.
The program instruction memory address and the data memory
address point to different physical locations in the same memory,
so the program instruction and data are of the same width.

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano


Quad Core 2GHz A15, 2GB RAM
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 21 | George Mason University
What Hardware Will Be Covered in This Course?
Streaming architecture: data items are pushed in and out as
sequential streams, the instructions are mapped into
programmable circuit units along the path from the input
ports to output ports. Therefore, instead of fetching
instructions and data back and forth from the memory, the
computation gets performed as the data streams flow
through the circuit units in one pass.

PYNQ ZCU Series (102, 104, 106)

Xilinx Alveo U280 Data Center


Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 22 | George Mason University
What Hardware Will Be Covered in This Course?

In-memory computing is the technique of running


computer calculations entirely in computer memory (e.g.,
in RAM).

[Yan, Advanced Intelligent Systems 2019]

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 23 | George Mason University
What Hardware Will Be Covered in This Course?
Quantum computing is a type of computation that harnesses the
collective properties of quantum states, such as superposition,
interference, and entanglement, to perform calculations.

Classical Bit Reading out Information from Qubit


(Measurement)
𝑋 = 0 𝒐𝒓 1
𝑎02 0
0
Quantum Bit (Qubit)
𝜓 0
1
|𝜓⟩ = 0 and |1⟩
𝑎12 1
1
𝑁𝑜𝑛−𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
𝐶𝑜𝑚𝑝𝑢𝑡𝑖𝑛𝑔
𝜓 = 𝑎0 0 + 𝑎1 |1⟩ 𝑎02 + 𝑎12 = 100%
40% + 60% = 100%

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 24 | George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!


• Why specialization?
• Power constraint of modern computers

<< 1W ~ 1W ~ 15W ~ 50W ~ 100W ~ 100W

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Image credit]:25Prof. ZhiruMason
| George Zhang @ Cornell
University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!


• Why specialization?
• Power constraint of modern computers
• In-efficiency of general–purpose computing

Embedded Processor Energy Breakdown


Arithmetic Clock and control Data supply Instruction supply

28%
24%
70%
42%

6%

ECE618 HW Accelerators for ML [Images credit]: Prof. Callie Hao


Dr. Weiwen Jiang, ECE, GMU 26 | @ GATech
George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!


• Why specialization?
• Power constraint of modern computers
• In-efficiency of general–purpose computing
• Data and computation explosion (big data, AI)

https://openai.com/blog/ai-and-compute/
[Bianco, IEEE Access 2018]
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
27 |Hao @ GATech
George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!


• Why specialization?
• Power constraint of modern computers 30 FPS
• In-efficiency of general–purpose computing
• Data and computation explosion (big data, AI)
• Real-time processing requirement

[Bianco, IEEE Access 2018]

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
28 | Hao @ Mason
George GATech
University
An Overview of Hardware Accelerators

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 29 | George Mason University
Schedule

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

Session I: Classical Computing Accelerators for Machine Learning


Date Topic

Jan. 24 Course Information & Machine Learning and FPGA Accelerator Recap

Jan. 31 Vector Architectures, FPGAs and GPU Architectures

Feb. 7 ASIC Accelerators


ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 30 | George Mason University
Schedule

Session II: Novel Post-Moore Computing Accelerators for ML


Date Topic

Feb. 14 In-Memory Computing Accelerator Design

Feb. 21 Neuromorphic Accelerators

Feb. 28 Hyperdimensional Computing Accelerators

Mar. 07 Quantum Neural Network Accelerators

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 31 | George Mason University
Schedule

Session III: Other Accelerator Related Topics


Date Topic

Mar. 28 Project Proposal

Apr. 04 Distributed Learning

Apr. 11 Hands-on Accelerator Design (1)

Apr. 18 Project Overview

Apr. 25 Hands-on Accelerator Design (2)

May 02 Project Presentations

May 11-18 Final exam

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 32 | George Mason University
Expectation & Final Project
• Implement ML on any hardware in a team with 1-3 students

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
with Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 33 | George Mason University
What Did We Learn in ECE 554? (Recap)

Application ML/DL Algorithms Optimization


Computer Vision AN MLP CNN Network Inference
Software Natural Language Network Training
RNN LSTM RL
Games Model Compression
Transformer MLP-Mixer
… … Network Design

Embedded Systems Performance Model Optimization

Resource Usage Hardware Design

Hardware
Power Mapping/Scheduling

Latency Communication

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 34 | George Mason University
ECE 554 Course Recap
• Machine Learning Basis:
▪ Different neural networks: MLP, CNN, RNN, RL
▪ Training (Gradient Descent) and inferencing neural networks using Pytorch
▪ Implement convolution using “for loops”

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 35 | George Mason University
Biological
Neuron
Human intelligence reside
in the brain:
• Approximately 86 billion neurons in the human brain
• The brain is a network of neurons, connected with nearly 1014 − 1015 synapses

How to equip intelligence in the machine?


• To understand how the brain network is constructed
• To mimic the brain

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 36 | George Mason University
Biological
Neuron
Neurons work together:
• Cell body process the information
• Dendrites receive messages from other neurons
• Axon transmit the output to many smaller branches
• Synapses are the contact points between axon (Neuron 1) and dendrites (Neuron 2) for
message passing

Cell body receives input signal from dendrites and produce output signal
along axon, which interact with the next neurons via synaptic weights

Synaptic weights are learnable to perform useful computations


(e.g., Recognizing objects, understanding language, making plans, controlling the body.)
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 37 | George Mason University
Artificial Neuron Design
◼ Idealized neuron models
◼ Idealization removes complicated details that are not essential for
understanding the main principles.
◼ It allows us to apply mathematics and to make analogies.

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 38 | George Mason University
McCulloch-Pitts (MP) Neuron
The first computational model of a biological neuron @ 1943
Synapse Signal Direction Assumptions:
𝑥0 Dendrite • Binary devices (i.e.,
+1
𝑥𝑖 ∈ 0,1 and 𝑦 ∈ 0,1 )
Cell Body
Synapse • Identical synaptic weights
Axon
Warren McCulloch 𝑥1 𝒈𝒇 (i.e., +1)
Dendrite
+1 𝑦=𝑓 𝑔 𝒙 • Activation function 𝒇 has a
fixed threshold 𝜽
Synapse Dendrite
y
𝑥2 1
+1
Walter Pitts
0 𝜽

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 39 | George Mason University
Artificial Neuron Design
◼ Idealized neuron models
◼ Idealization removes complicated details that are not essential for
understanding the main principles.
◼ It allows us to apply mathematics and to make analogies.

◼ Break the limitations on MP Neuron


◼ What about non-boolean inputs (say, real number)?
◼ What if we want to assign more weight (importance) to some inputs?
◼ What about functions which are not linearly separable ?
◼ Do we always need to hand code the threshold?
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 40 | George Mason University
Multi-Layer Perceptron (MLP) – Lecture 2
• Input layer, output layer and hidden layers

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 41 | George Mason University
Deep Convolutional Neural Networks (CNN) – Lecture 3

● One of the most widely used types of deep network


● Fully-connected nets treat far apart input pixels same as those close by
– Hence spatial information must be inferred from the training data
● In contrast, CNN proposes an architecture that inherently tries to take
advantage of the spatial structure
– Such an architecture makes convolutional networks fast to train
– This, in turn, helps us train even deeper, many-layer networks
● Today, deep convolutional networks or some close variants are used in
solving many interesting problems that go beyond image classification
● We will use image classification as a driving use case to explain the main
concepts behind CNN

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 42 | George Mason University
Convolution – Lecture 3

Parameters:
• N: input channels
CLASS
• M: output channels
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
• K: kernel size
• P: padding size dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype
=None)
• S: stride
• D: dilation
• R: rows
• C: columns
[ref] Aqeel Anwar, What is Transposed Convolutional Layer? https://towardsdatascience.com/what-is-transposed-convolutional-layer-40e5e6e31c11

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 43 | George Mason University
From Static Image to Sequences of Data

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 44 | George Mason University
RNN and Feedforward Network – Lecture 5
w1 w2
time=3 w4
w1 w3
w2

w3 w4
time=2 w3 w4
w1 w2

• Assume each connection has 1


unit delay time=1 w4
w1 w3
w2
• RNN can be unrolled into
feedforward networks
• Each layer keeps on time=0
reusing the same weights
Courtesy to Geff. Hinton

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU


RNN and Feedforward Network – Lecture 5

From GoodFellow et al.


Deep Learning

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU


ECE 554 Course Recap
• Machine Learning Basis:
▪ Different neural networks: MLP, CNN, RNN, RL
▪ Training (Gradient Descent) and inferencing neural networks using Pytorch
▪ Implement convolution using “for loops”
• Put Machine Learning onto Embedded Systems:
▪ Introduction to HLS (Lec 8-9)
o Using MLP as example in class
o Using CNN as example in Labs, which is based on the “for loop” implementation
▪ Model compression on FPGA: pruning and quantization (Lec 10-11)
▪ Neural architecture search (Lec 12)
o Using RNN-based RL as controller/optimizer
o Using Gradient Descent approach for optimization
▪ Data movement in HLS-based FPGA implementation (Lec 13)
▪ Co-explore neural architectures and FPGA design (Lec 14)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 47 | George Mason University
High-Level Synthesis: HLS – Lecture 8
• High-Level Synthesis
‒ Creates an RTL implementation from
C, C++, System C, OpenCL API C
kernel code
‒ Extracts control and dataflow from
the source code
‒ Implements the design based on
defaults and user applied directives
• Many implementation are possible from
the same source description
‒ Smaller designs, faster designs,
optimal designs
‒ Enables design exploration

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 48 | George Mason University
C Validation and RTL Verification – Lecture 8
• There are two steps to verifying the design
– Pre-synthesis: C Validation
• Validate the algorithm is correct
– Post-synthesis: RTL Verification
• Verify the RTL is correct
• C validation
– A HUGE reason users want to use HLS Validate C
• Fast, free verification
− Validate the algorithm is correct before synthesis
• Follow the test bench tips given over
• RTL Verification
• Vivado HLS can co-simulate the RTL with the Verify RTL

original test bench

© Copyright
ECE618 2016
HW Accelerators for Xilinx
ML Dr. Weiwen Jiang, ECE, GMU 49 | George Mason University
AXI_Stream – Lecture 13

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 50 | George Mason University
Test Bench – Lecture 13

https://colab.research.google.com/drive/1TufHcDN
Mftm3bwAfcKEF5Njev0y_v6rM#scrollTo=X3pBQ
myNW4rs

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 51 | George Mason University
Export RTL as IP Core – Lecture 13

Synthesis

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 52 | George Mason University
Import the IP into Block Design – Lecture 13

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 53 | George Mason University
Goal: Enable AI for Everyone – Lecture 14

AI Democratization —— Two Levels

ProxylessNAS

FBNet
Level 1: Automation of
Neural Network Design
FNAS

NAS NASNet MNasNet ……

Level 2: Automation of
AI System Design

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 54 | George Mason University
One Network Cannot Work for All Platforms – Lecture 14
◆ Cloud / Server
• Unlimited Resource
ResNet
• Maximizing Accuracy AlexNet

• AlexNet, VGGNet, ResNet, … NASNet


MobileNet

◆ Mobile Phones
• Fixed Hardware MnasNet
FBNet
• Accuracy v.s. Latency
ProxylessNAS
• MnasNet, ProxylessNAS, …
SkyNet
◆ Hardware Accelerators (e.g., FPGA)
EDDNet
• Hardware Design Flexibility FNAS
• Accuracy, Latency, and Energy
• FNAS, SkyNet, EDDNet…

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 55 | George Mason University 55
Datasets/Applications, Hardware, and Neural Networks – Lecture 14
Datasets / Applications Neural Networks

Hardware Platforms

FPGA

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 56 | George Mason University
Manual Design: Expensive, But Build the Road – Lecture 14

1 year for only 1 application


Name Year Acc.(T5)
AlexNet 2012 83.4%
ZFNet 2013 88.3%
VGGNet 2014 92.7%
RestNet 2015 96.4% Manual AI Design
GoogleNet 2016 96.9%
Problem
• Domain knowledge and
excessive labor
• It takes too long to devise new
architectures
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 57 | George Mason University
AutoML: Neural Architecture Search (NAS) – Lecture 14
Sample architecture NN
with probability p

Train from Scratch RL NAS


Controller
To Obtain Accuracy
(RNN)
A

Compute gradient of p and


scale it by A to update the Automatic NAS
controller Manual AI Design
Reinforcement Learning Based NAS

Problem
• Low Efficiency, hundreds even
thousands of GPU hours
• Mono-Objective: Accuracy,
leading network too complicated
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 58 | George Mason University 58
AutoML: Differentiable Architecture Search – Lecture 14

Name Time
DARTS Jun. 2018
RL NAS

Search Trained
Super Net Sub-Net
Space Super Net
Automatic NAS
Manual AI Design

DARTS

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 59 | George Mason University 59
AutoML: Hardware-Aware NAS – Lecture 14

Name Time
MnasNet Jul. 2018
ProxylessNAS Dec. 2018
FBNet Dec. 2018
Latency-Aware NAS
Automatic NAS
Manual AI Design

Latency-Aware NAS

Performance
Controller Trainer Predictor

Accuracy Latency

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 60 | George Mason University 60
AutoML: Network-FPGA Co-Design Using NAS – Lecture 14

Name Time
FNAS (ours) Jan. 2019
DNN/FPGA Apr. 2019
SkyNet Sep. 2019
EDDNet May. 2020 Latency-Aware NAS
Automatic NAS
Manual AI Design

FNAS Co-Design NAS-FPGA

FNAS - DAC’19 (Best Paper Nomination)


TCAD’20 (Best Paper Award)
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 61 | George Mason University
How to Conduct Neural Architecture Search – Lecture 14
• Selection of the Backbone Architecture
• VGG (NAS with RL, FNAS), GoogLeNet (NASNet), MobileNet (FBNet, ProxylessNAS), etc.

• Determination of the Search Space


• Software: Number of Channels, Kernel Size, Convolution Type, etc.
• Hardware: Loop Titling Parameters, Loop Order, Schedule, etc.

• Optimization Approaches
• Deep Reinforcement Learning: RNN based controller
• Gradient Descent: DARTS
• Metaheuristics: Swarm

• Optimization Objective(s):
• Software: Accuracy, Robustness, Fairness, etc.
• Hardware: Latency, Chip Area, Energy Efficiency, etc.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 62 | George Mason University
Programming Platform

https://colab.research.google.com/

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 63 | George Mason University
George Mason University
4400 University Drive
Fairfax, Virginia 22030
Tel: (703)993-1000

64 | George Mason University

You might also like