0% found this document useful (0 votes)

21 views

Lec1 Introduction ECE618

Uploaded by

ECE N.V.Satyanarayana Murthy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lec1 Introduction ECE618

Uploaded by

ECE N.V.Satyanarayana Murthy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

ECE618

Hardware Accelerators for Machine Learning

(Spring 2022)

Lecture 1: Course Information &

Machine Learning and FPGA Accelerator Recap
Weiwen Jiang, Ph.D.
Electrical and Computer Engineering
George Mason University
[email protected]
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 2 | George Mason University
Course Information
Instructor Dr. Weiwen Jiang
E-Mail [email protected]
Phone (703)993-5083
Lecture Time Monday 19:20 - 22:00
Location Room 1002, Music/Theater Building
Office Hour Monday 16:30 - 17:30
Office Room 3247, Nguyen Engineering Building
Zoom http://go.gmu.edu/zoom4weiwen
Backup Course Zoom https://go.gmu.edu/ece618 (Need Permission First)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 3 | George Mason University
About Me.
• Background
• Researcher at University of Pittsburgh (2017-2019)
• Postdoc at University of Notre Dame (2019-2021)
• George Mason University (2021 - present)
• Research Interests
• HW/SW Co-Design
Dr. Weiwen Jiang • Quantum Machine Learning
• Contacts:
• [email protected]
• Nguyen Engineering Building, Room3247
• (703)993-5083
• https://jqub.ece.gmu.edu/

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 4 | George Mason University
Teaching Assistant

Yi Sheng (Ph.D. Candidate)

[email protected]
https://jqub.ece.gmu.edu/yi/
Office Hours: TBD

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 5 | George Mason University
Course Description
Covers the hardware design principles to deploy different
machine learning algorithms. The emphasis is on understanding
the fundamentals of machine learning and hardware
architectures and determine plausible methods to bridge them.

Topics include precision scaling, in-memory computing,

hyperdimensional computing, architectural modifications, GPUs
and vector architectures, quantum computing as well as recent
hardware programming tools such as Xilinx AI Vitis, Xilinx
HLS, and IBM Qiskit.

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 6 | George Mason University
Recommend Prerequisite
• ECE 554: Machine Learning for Embedded Systems

• Good C programming
• Especially required for FPGA-related project

• Familiar with Python and PyTorch

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 7 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 8 | George Mason University
Course Resources
• Blackboard:
• Assignments will be posted and submitted here!
• Online discussion, shared documents, announcements.
• Do NOT upload codes in discussion.
• Course Website:
• https://jqub.ece.gmu.edu/2022/01/01/HA4ML/
• Course information (TA time, location, zoom, etc.)
• Slides, readings, and documents will be posted here!

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 9 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 10 | George Mason University
Grading Policy
● Midterm Exam 10%
● Final Exam 20%
● Research Paper Presentation 20%
● Assignments and Labs 20%
● Project 30%

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 11 | George Mason University
You Have Been Warned.
Zero Tolerance!
• No matter vaccinated or not, face mask is required
in class

• Request to a Zoom access for a few classes if needed

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 12 | George Mason University
You Have Been Warned.
Zero Tolerance!
• Lecture content and materials should NOT go online
without explicit permission

• No plagiarism!
The most common sense of way interpreting no plagiarism:
You need to DO your work.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 13 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 14 | George Mason University
Tools for lab
Google Colab

Xilinx High-Level Synthesis

IBM Qiskit

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 15 | George Mason University
Agenda
Course Information

Course Resources

Course Policy

Tools for Lab

Motivation and Schedule

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 16 | George Mason University
What Software to Be Accelerated? --- MLP/CNN
Supervised Learning
Example: Classification
Training

Given: Labeled data as training dataset

(𝑥𝑖 , 𝑦𝑖 ): 𝑥𝑖 training data, 𝑦𝑖 : label

𝑥𝑖 = 𝑦𝑖 = 3

Output: A learned function 𝒇 from X to Y

𝒇: 𝑥 ↦ 𝑦

Inference/Execution
Given: Unseen data test dataset
A learned function 𝒇

Do: 𝒇( )=3

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 17 | George Mason University
What Software to Be Accelerated? --- MLP/CNN

• Local receptive fields Cat?

• Shared weights
• Pooling (subsampling) Dog?
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 18 | George Mason University
What Software to Be Accelerated? --- RNN
Supervised Learning
Example: Classification
Training

Given: Labeled data as training dataset

(𝑥𝑖 , 𝑦𝑖 ): 𝑥𝑖 training data, 𝑦𝑖 : label

𝑥𝑖 = 𝑦𝑖 =“can I”

Output: A learned function 𝒇 from X to Y

𝒇: 𝑥 ↦ 𝑦
SEP-28k Dataset
Inference/Execution
Given: Unseen data test dataset
A learned function 𝒇

Do: 𝒇 = “brown fox”

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 19 | George Mason University
What Software to Be Accelerated? --- RNN

Image Sentiment Translation Video frame

captioning classification (words → classification
(image → (words → words) (frames →
words) sentiment) classes)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU

What Hardware Will Be Covered in This Course?
The von Neumann structure, also known as the Princeton
structure, is a memory structure that merges program
instruction memory and data memory together.
The program instruction memory address and the data memory
address point to different physical locations in the same memory,
so the program instruction and data are of the same width.

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano

Quad Core 2GHz A15, 2GB RAM
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 21 | George Mason University
What Hardware Will Be Covered in This Course?
Streaming architecture: data items are pushed in and out as
sequential streams, the instructions are mapped into
programmable circuit units along the path from the input
ports to output ports. Therefore, instead of fetching
instructions and data back and forth from the memory, the
computation gets performed as the data streams flow
through the circuit units in one pass.

PYNQ ZCU Series (102, 104, 106)

Xilinx Alveo U280 Data Center

Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 22 | George Mason University
What Hardware Will Be Covered in This Course?

In-memory computing is the technique of running

computer calculations entirely in computer memory (e.g.,
in RAM).

[Yan, Advanced Intelligent Systems 2019]

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 23 | George Mason University
What Hardware Will Be Covered in This Course?
Quantum computing is a type of computation that harnesses the
collective properties of quantum states, such as superposition,
interference, and entanglement, to perform calculations.

Classical Bit Reading out Information from Qubit

(Measurement)
𝑋 = 0 𝒐𝒓 1
𝑎02 0
0
Quantum Bit (Qubit)
𝜓 0
1
|𝜓⟩ = 0 and |1⟩
𝑎12 1
1
𝑁𝑜𝑛−𝐷𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
𝐶𝑜𝑚𝑝𝑢𝑡𝑖𝑛𝑔
𝜓 = 𝑎0 0 + 𝑎1 |1⟩ 𝑎02 + 𝑎12 = 100%
40% + 60% = 100%

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 24 | George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!

• Why specialization?
• Power constraint of modern computers

<< 1W ~ 1W ~ 15W ~ 50W ~ 100W ~ 100W

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Image credit]:25Prof. ZhiruMason
| George Zhang @ Cornell
University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!

• Why specialization?
• Power constraint of modern computers
• In-efficiency of general–purpose computing

Embedded Processor Energy Breakdown

Arithmetic Clock and control Data supply Instruction supply

28%
24%
70%
42%

ECE618 HW Accelerators for ML [Images credit]: Prof. Callie Hao

Dr. Weiwen Jiang, ECE, GMU 26 | @ GATech
George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!

• Why specialization?
• Power constraint of modern computers
• In-efficiency of general–purpose computing
• Data and computation explosion (big data, AI)

https://openai.com/blog/ai-and-compute/
[Bianco, IEEE Access 2018]
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
27 |Hao @ GATech
George Mason University
Why Need Specialized Hardware Accelerators?

• Specialized High-Efficiency Computing!

• Why specialization?
• Power constraint of modern computers 30 FPS
• In-efficiency of general–purpose computing
• Data and computation explosion (big data, AI)
• Real-time processing requirement

[Bianco, IEEE Access 2018]

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
28 | Hao @ Mason
George GATech
University
An Overview of Hardware Accelerators

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 29 | George Mason University
Schedule

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

Session I: Classical Computing Accelerators for Machine Learning

Date Topic

Jan. 24 Course Information & Machine Learning and FPGA Accelerator Recap

Jan. 31 Vector Architectures, FPGAs and GPU Architectures

Feb. 7 ASIC Accelerators

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 30 | George Mason University
Schedule

Session II: Novel Post-Moore Computing Accelerators for ML

Date Topic

Feb. 14 In-Memory Computing Accelerator Design

Feb. 21 Neuromorphic Accelerators

Feb. 28 Hyperdimensional Computing Accelerators

Mar. 07 Quantum Neural Network Accelerators

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 31 | George Mason University
Schedule

Session III: Other Accelerator Related Topics

Date Topic

Mar. 28 Project Proposal

Apr. 04 Distributed Learning

Apr. 11 Hands-on Accelerator Design (1)

Apr. 18 Project Overview

Apr. 25 Hands-on Accelerator Design (2)

May 02 Project Presentations

May 11-18 Final exam

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 32 | George Mason University
Expectation & Final Project
• Implement ML on any hardware in a team with 1-3 students

Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)

ODROID-XU4 Single Board Computer NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
with Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 33 | George Mason University
What Did We Learn in ECE 554? (Recap)

Application ML/DL Algorithms Optimization

Computer Vision AN MLP CNN Network Inference
Software Natural Language Network Training
RNN LSTM RL
Games Model Compression
Transformer MLP-Mixer
… … Network Design

Embedded Systems Performance Model Optimization

Resource Usage Hardware Design

Hardware
Power Mapping/Scheduling

Latency Communication

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 34 | George Mason University
ECE 554 Course Recap
• Machine Learning Basis:
▪ Different neural networks: MLP, CNN, RNN, RL
▪ Training (Gradient Descent) and inferencing neural networks using Pytorch
▪ Implement convolution using “for loops”

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 35 | George Mason University
Biological
Neuron
Human intelligence reside
in the brain:
• Approximately 86 billion neurons in the human brain
• The brain is a network of neurons, connected with nearly 1014 − 1015 synapses

How to equip intelligence in the machine?

• To understand how the brain network is constructed
• To mimic the brain

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 36 | George Mason University
Biological
Neuron
Neurons work together:
• Cell body process the information
• Dendrites receive messages from other neurons
• Axon transmit the output to many smaller branches
• Synapses are the contact points between axon (Neuron 1) and dendrites (Neuron 2) for
message passing

Cell body receives input signal from dendrites and produce output signal
along axon, which interact with the next neurons via synaptic weights

Synaptic weights are learnable to perform useful computations

(e.g., Recognizing objects, understanding language, making plans, controlling the body.)
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 37 | George Mason University
Artificial Neuron Design
◼ Idealized neuron models
◼ Idealization removes complicated details that are not essential for
understanding the main principles.
◼ It allows us to apply mathematics and to make analogies.

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 38 | George Mason University
McCulloch-Pitts (MP) Neuron
The first computational model of a biological neuron @ 1943
Synapse Signal Direction Assumptions:
𝑥0 Dendrite • Binary devices (i.e.,
+1
𝑥𝑖 ∈ 0,1 and 𝑦 ∈ 0,1 )
Cell Body
Synapse • Identical synaptic weights
Axon
Warren McCulloch 𝑥1 𝒈𝒇 (i.e., +1)
Dendrite
+1 𝑦=𝑓 𝑔 𝒙 • Activation function 𝒇 has a
fixed threshold 𝜽
Synapse Dendrite
y
𝑥2 1
+1
Walter Pitts
0 𝜽

ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 39 | George Mason University
Artificial Neuron Design
◼ Idealized neuron models
◼ Idealization removes complicated details that are not essential for
understanding the main principles.
◼ It allows us to apply mathematics and to make analogies.

◼ Break the limitations on MP Neuron

◼ What about non-boolean inputs (say, real number)?
◼ What if we want to assign more weight (importance) to some inputs?
◼ What about functions which are not linearly separable ?
◼ Do we always need to hand code the threshold?
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 40 | George Mason University
Multi-Layer Perceptron (MLP) – Lecture 2
• Input layer, output layer and hidden layers

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 41 | George Mason University
Deep Convolutional Neural Networks (CNN) – Lecture 3

● One of the most widely used types of deep network

● Fully-connected nets treat far apart input pixels same as those close by
– Hence spatial information must be inferred from the training data
● In contrast, CNN proposes an architecture that inherently tries to take
advantage of the spatial structure
– Such an architecture makes convolutional networks fast to train
– This, in turn, helps us train even deeper, many-layer networks
● Today, deep convolutional networks or some close variants are used in
solving many interesting problems that go beyond image classification
● We will use image classification as a driving use case to explain the main
concepts behind CNN

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 42 | George Mason University
Convolution – Lecture 3

Parameters:
• N: input channels
CLASS
• M: output channels
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
• K: kernel size
• P: padding size dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype
=None)
• S: stride
• D: dilation
• R: rows
• C: columns
[ref] Aqeel Anwar, What is Transposed Convolutional Layer? https://towardsdatascience.com/what-is-transposed-convolutional-layer-40e5e6e31c11

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 43 | George Mason University
From Static Image to Sequences of Data

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 44 | George Mason University
RNN and Feedforward Network – Lecture 5
w1 w2
time=3 w4
w1 w3
w2

w3 w4
time=2 w3 w4
w1 w2

• Assume each connection has 1

unit delay time=1 w4
w1 w3
w2
• RNN can be unrolled into
feedforward networks
• Each layer keeps on time=0
reusing the same weights
Courtesy to Geff. Hinton

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU

RNN and Feedforward Network – Lecture 5

From GoodFellow et al.

Deep Learning

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU

ECE 554 Course Recap
• Machine Learning Basis:
▪ Different neural networks: MLP, CNN, RNN, RL
▪ Training (Gradient Descent) and inferencing neural networks using Pytorch
▪ Implement convolution using “for loops”
• Put Machine Learning onto Embedded Systems:
▪ Introduction to HLS (Lec 8-9)
o Using MLP as example in class
o Using CNN as example in Labs, which is based on the “for loop” implementation
▪ Model compression on FPGA: pruning and quantization (Lec 10-11)
▪ Neural architecture search (Lec 12)
o Using RNN-based RL as controller/optimizer
o Using Gradient Descent approach for optimization
▪ Data movement in HLS-based FPGA implementation (Lec 13)
▪ Co-explore neural architectures and FPGA design (Lec 14)

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 47 | George Mason University
High-Level Synthesis: HLS – Lecture 8
• High-Level Synthesis
‒ Creates an RTL implementation from
C, C++, System C, OpenCL API C
kernel code
‒ Extracts control and dataflow from
the source code
‒ Implements the design based on
defaults and user applied directives
• Many implementation are possible from
the same source description
‒ Smaller designs, faster designs,
optimal designs
‒ Enables design exploration

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 48 | George Mason University
C Validation and RTL Verification – Lecture 8
• There are two steps to verifying the design
– Pre-synthesis: C Validation
• Validate the algorithm is correct
– Post-synthesis: RTL Verification
• Verify the RTL is correct
• C validation
– A HUGE reason users want to use HLS Validate C
• Fast, free verification
− Validate the algorithm is correct before synthesis
• Follow the test bench tips given over
• RTL Verification
• Vivado HLS can co-simulate the RTL with the Verify RTL

original test bench

© Copyright
ECE618 2016
HW Accelerators for Xilinx
ML Dr. Weiwen Jiang, ECE, GMU 49 | George Mason University
AXI_Stream – Lecture 13

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 50 | George Mason University
Test Bench – Lecture 13

https://colab.research.google.com/drive/1TufHcDN
Mftm3bwAfcKEF5Njev0y_v6rM#scrollTo=X3pBQ
myNW4rs

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 51 | George Mason University
Export RTL as IP Core – Lecture 13

Synthesis

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 52 | George Mason University
Import the IP into Block Design – Lecture 13

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 53 | George Mason University
Goal: Enable AI for Everyone – Lecture 14

AI Democratization —— Two Levels

ProxylessNAS

FBNet
Level 1: Automation of
Neural Network Design
FNAS

NAS NASNet MNasNet ……

Level 2: Automation of
AI System Design

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 54 | George Mason University
One Network Cannot Work for All Platforms – Lecture 14
◆ Cloud / Server
• Unlimited Resource
ResNet
• Maximizing Accuracy AlexNet

• AlexNet, VGGNet, ResNet, … NASNet

MobileNet

◆ Mobile Phones
• Fixed Hardware MnasNet
FBNet
• Accuracy v.s. Latency
ProxylessNAS
• MnasNet, ProxylessNAS, …
SkyNet
◆ Hardware Accelerators (e.g., FPGA)
EDDNet
• Hardware Design Flexibility FNAS
• Accuracy, Latency, and Energy
• FNAS, SkyNet, EDDNet…

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 55 | George Mason University 55
Datasets/Applications, Hardware, and Neural Networks – Lecture 14
Datasets / Applications Neural Networks

Hardware Platforms

FPGA

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 56 | George Mason University
Manual Design: Expensive, But Build the Road – Lecture 14

1 year for only 1 application

Name Year Acc.(T5)
AlexNet 2012 83.4%
ZFNet 2013 88.3%
VGGNet 2014 92.7%
RestNet 2015 96.4% Manual AI Design
GoogleNet 2016 96.9%
Problem
• Domain knowledge and
excessive labor
• It takes too long to devise new
architectures
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 57 | George Mason University
AutoML: Neural Architecture Search (NAS) – Lecture 14
Sample architecture NN
with probability p

Train from Scratch RL NAS

Controller
To Obtain Accuracy
(RNN)
A

Compute gradient of p and

scale it by A to update the Automatic NAS
controller Manual AI Design
Reinforcement Learning Based NAS

Problem
• Low Efficiency, hundreds even
thousands of GPU hours
• Mono-Objective: Accuracy,
leading network too complicated
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 58 | George Mason University 58
AutoML: Differentiable Architecture Search – Lecture 14

Name Time
DARTS Jun. 2018
RL NAS

Search Trained
Super Net Sub-Net
Space Super Net
Automatic NAS
Manual AI Design

DARTS

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 59 | George Mason University 59
AutoML: Hardware-Aware NAS – Lecture 14

Name Time
MnasNet Jul. 2018
ProxylessNAS Dec. 2018
FBNet Dec. 2018
Latency-Aware NAS
Automatic NAS
Manual AI Design

Latency-Aware NAS

Performance
Controller Trainer Predictor

Accuracy Latency

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 60 | George Mason University 60
AutoML: Network-FPGA Co-Design Using NAS – Lecture 14

Name Time
FNAS (ours) Jan. 2019
DNN/FPGA Apr. 2019
SkyNet Sep. 2019
EDDNet May. 2020 Latency-Aware NAS
Automatic NAS
Manual AI Design

FNAS Co-Design NAS-FPGA

FNAS - DAC’19 (Best Paper Nomination)

TCAD’20 (Best Paper Award)
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 61 | George Mason University
How to Conduct Neural Architecture Search – Lecture 14
• Selection of the Backbone Architecture
• VGG (NAS with RL, FNAS), GoogLeNet (NASNet), MobileNet (FBNet, ProxylessNAS), etc.

• Determination of the Search Space

• Software: Number of Channels, Kernel Size, Convolution Type, etc.
• Hardware: Loop Titling Parameters, Loop Order, Schedule, etc.

• Optimization Approaches
• Deep Reinforcement Learning: RNN based controller
• Gradient Descent: DARTS
• Metaheuristics: Swarm

• Optimization Objective(s):
• Software: Accuracy, Robustness, Fairness, etc.
• Hardware: Latency, Chip Area, Energy Efficiency, etc.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 62 | George Mason University
Programming Platform

https://colab.research.google.com/

ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 63 | George Mason University
George Mason University
4400 University Drive
Fairfax, Virginia 22030
Tel: (703)993-1000

64 | George Mason University

Logic Design Theory by Nn Biswas
No ratings yet
Logic Design Theory by Nn Biswas
3 pages
IOT Architecture & Protocols Lab Manual
100% (1)
IOT Architecture & Protocols Lab Manual
100 pages
Hamilton Microlab STAR - Service Manual
No ratings yet
Hamilton Microlab STAR - Service Manual
284 pages
User Guide Manual PDF
100% (1)
User Guide Manual PDF
166 pages
Hardware Accelerators For Machine Learning (CS 217) by cs217
No ratings yet
Hardware Accelerators For Machine Learning (CS 217) by cs217
8 pages
Basics Computer Architecture by Pooyan Jamshidi 1731311297
No ratings yet
Basics Computer Architecture by Pooyan Jamshidi 1731311297
266 pages
Day5 FDP IoT Part1
No ratings yet
Day5 FDP IoT Part1
89 pages
Syllabus Ee541 22sp
No ratings yet
Syllabus Ee541 22sp
7 pages
Engineering Computing M2 (KFMV) 2009/10: Programming (In Matlab)
No ratings yet
Engineering Computing M2 (KFMV) 2009/10: Programming (In Matlab)
50 pages
Syllabus DL Spring 2023
No ratings yet
Syllabus DL Spring 2023
9 pages
MSC Subject HW - Accelerators - Uav - Algorithms Taran Rainbow
No ratings yet
MSC Subject HW - Accelerators - Uav - Algorithms Taran Rainbow
5 pages
ECE 111 (Spring 2018) : - Professor Bill Lin - Lectures
No ratings yet
ECE 111 (Spring 2018) : - Professor Bill Lin - Lectures
14 pages
Lec1 Intro to p556
No ratings yet
Lec1 Intro to p556
29 pages
Lecture 1 - Why Study ML
No ratings yet
Lecture 1 - Why Study ML
20 pages
Week 1 A
No ratings yet
Week 1 A
31 pages
CS Electives!
No ratings yet
CS Electives!
3 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
Advanced Computer Architecture Fall 2019 Multithreaded Architectures
No ratings yet
Advanced Computer Architecture Fall 2019 Multithreaded Architectures
31 pages
Basic Design Approaches To Accelerating Deep Neural Networks
No ratings yet
Basic Design Approaches To Accelerating Deep Neural Networks
93 pages
Introduction To HPC: Content and Definitions
No ratings yet
Introduction To HPC: Content and Definitions
22 pages
Python DL
No ratings yet
Python DL
52 pages
EAI - Lecture 1
No ratings yet
EAI - Lecture 1
14 pages
Simulation Manual For Configurable Mapreduce Accelerator: Gheorghe M. S Tefan
No ratings yet
Simulation Manual For Configurable Mapreduce Accelerator: Gheorghe M. S Tefan
78 pages
Haoru_Xue
No ratings yet
Haoru_Xue
1 page
Mapping To Hardware: 6.5930/1 Hardware Architectures For Deep Learning
No ratings yet
Mapping To Hardware: 6.5930/1 Hardware Architectures For Deep Learning
60 pages
ECE316_Cuevas_Fa22_Public
No ratings yet
ECE316_Cuevas_Fa22_Public
11 pages
Lecture1 AML
No ratings yet
Lecture1 AML
16 pages
GPU Computation (Continued) : 6.5930/1 Hardware Architectures For Deep Learning
No ratings yet
GPU Computation (Continued) : 6.5930/1 Hardware Architectures For Deep Learning
71 pages
King
No ratings yet
King
13 pages
2pageresume Shambhavi
0% (1)
2pageresume Shambhavi
2 pages
Lec 0
No ratings yet
Lec 0
24 pages
Lec-All Deep Learning Coursework
100% (2)
Lec-All Deep Learning Coursework
639 pages
01 Intro
No ratings yet
01 Intro
49 pages
DEEP LEARNING LAB Manuals
No ratings yet
DEEP LEARNING LAB Manuals
55 pages
PI_CSE30_lecture_1_intro_pdf
No ratings yet
PI_CSE30_lecture_1_intro_pdf
45 pages
FPGA Code Accelerators - The Compiler Perspective: Conference Paper
No ratings yet
FPGA Code Accelerators - The Compiler Perspective: Conference Paper
7 pages
23CS401 AIML LAB MANUAL.pdf
No ratings yet
23CS401 AIML LAB MANUAL.pdf
55 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
10 pages
Data Processing On Fpgas
No ratings yet
Data Processing On Fpgas
14 pages
IITMandixMasai Brochure
No ratings yet
IITMandixMasai Brochure
12 pages
Syllabus E63 Spring2016-2
No ratings yet
Syllabus E63 Spring2016-2
3 pages
guide
No ratings yet
guide
142 pages
2nd DT Worksheet
No ratings yet
2nd DT Worksheet
17 pages
3519939.3523446
No ratings yet
3519939.3523446
16 pages
BoS - IoT - 5th Sem - 2021 Admitted
No ratings yet
BoS - IoT - 5th Sem - 2021 Admitted
16 pages
Brochure-Jan 2023 ML IOT Workshop Updated
No ratings yet
Brochure-Jan 2023 ML IOT Workshop Updated
2 pages
1-Introduction
No ratings yet
1-Introduction
81 pages
EE292A Lecture 2.ML - Hardware
No ratings yet
EE292A Lecture 2.ML - Hardware
61 pages
BE Syllabus 2022-23 (1)
No ratings yet
BE Syllabus 2022-23 (1)
19 pages
Course Outline: DLCP Curriculum Walkthrough
No ratings yet
Course Outline: DLCP Curriculum Walkthrough
3 pages
Ec3561 Vlsi Laboratory L T P C
No ratings yet
Ec3561 Vlsi Laboratory L T P C
6 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Paper1a1 Slides
No ratings yet
Paper1a1 Slides
39 pages
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
No ratings yet
Lecture 4 Parallel and Scalable Machine Learning With HPC Part 1
47 pages
L01-Intro
No ratings yet
L01-Intro
57 pages
chapter-9
No ratings yet
chapter-9
27 pages
CENG3151 BasicVHDL For Lab PDF
No ratings yet
CENG3151 BasicVHDL For Lab PDF
21 pages
ECE 6913 Fall 2022 Syllabus1
No ratings yet
ECE 6913 Fall 2022 Syllabus1
4 pages
applsci-15-00688-v3
No ratings yet
applsci-15-00688-v3
21 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
98 pages
Seminar Data Science (No Videos)
No ratings yet
Seminar Data Science (No Videos)
22 pages
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
From Everand
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
Ken Kwong-Kay Wong
3/5 (1)
Online Finite Element Analysis Course
From Everand
Online Finite Element Analysis Course
Dr. James A. Mandel P.E.
No ratings yet
2024 Protocols
No ratings yet
2024 Protocols
2 pages
Arduino UNO - Lab Manual-5-33
100% (1)
Arduino UNO - Lab Manual-5-33
29 pages
Keysight List of Product
No ratings yet
Keysight List of Product
1 page
Application Protocol - MQTT
No ratings yet
Application Protocol - MQTT
21 pages
Cadence EDA List of Products
No ratings yet
Cadence EDA List of Products
5 pages
NASSCOM Certification
No ratings yet
NASSCOM Certification
2 pages
Cadence Brochure
No ratings yet
Cadence Brochure
8 pages
Modern AI Pro Essentials
100% (1)
Modern AI Pro Essentials
9 pages
Microprocessors stm32mp2 Series Overview
No ratings yet
Microprocessors stm32mp2 Series Overview
28 pages
IOT Notes Unit-2 Part-A
No ratings yet
IOT Notes Unit-2 Part-A
6 pages
Unit 2 Fundamental Iot Mechanism & Key Technologies: Prof. Rajashree Sutrawe
50% (2)
Unit 2 Fundamental Iot Mechanism & Key Technologies: Prof. Rajashree Sutrawe
48 pages
Button Read
No ratings yet
Button Read
11 pages
Big Data and Technologies
No ratings yet
Big Data and Technologies
12 pages
C and C++
No ratings yet
C and C++
42 pages
Syllabus 1
No ratings yet
Syllabus 1
17 pages
IOT Unit-I Notes
No ratings yet
IOT Unit-I Notes
83 pages
Application Protocol - Web Socket
No ratings yet
Application Protocol - Web Socket
19 pages
Assembly Language
No ratings yet
Assembly Language
26 pages
Application Protocol - CoAP
No ratings yet
Application Protocol - CoAP
27 pages
PTSP Unit-2 Notes
No ratings yet
PTSP Unit-2 Notes
65 pages
E Notes
No ratings yet
E Notes
174 pages
IOT Unit-II Notes
No ratings yet
IOT Unit-II Notes
62 pages
PTSP - Unit-4 - Important - Questions& - Answers
No ratings yet
PTSP - Unit-4 - Important - Questions& - Answers
27 pages
README
No ratings yet
README
2 pages
HTTP Handlers and HTTP Modules
No ratings yet
HTTP Handlers and HTTP Modules
22 pages
Internal and External Devices of Computer
No ratings yet
Internal and External Devices of Computer
53 pages
Deployment Guide: Deploying F5 With Microsoft Remote Desktop Session Host Servers
No ratings yet
Deployment Guide: Deploying F5 With Microsoft Remote Desktop Session Host Servers
23 pages
Nmake User Alu3.9
No ratings yet
Nmake User Alu3.9
327 pages
Module 9: Processing Records Records
No ratings yet
Module 9: Processing Records Records
39 pages
MANUAL // Handy Seamless Transitions - Pack & Script: Muller
No ratings yet
MANUAL // Handy Seamless Transitions - Pack & Script: Muller
10 pages
Assignment C++ Programming
No ratings yet
Assignment C++ Programming
10 pages
Blue Star Energy Meter User Manual
No ratings yet
Blue Star Energy Meter User Manual
24 pages
FortiGate Inf 02 SDWAN+
No ratings yet
FortiGate Inf 02 SDWAN+
38 pages
PhreePlot (026 109)
No ratings yet
PhreePlot (026 109)
84 pages
JavaScript Arrays
No ratings yet
JavaScript Arrays
13 pages
Learn & Master The Basics of RxSwift in 10 Minutes - Sebastian Boldt (2017)
No ratings yet
Learn & Master The Basics of RxSwift in 10 Minutes - Sebastian Boldt (2017)
12 pages
7-CCTV Technical Specification (V5)
No ratings yet
7-CCTV Technical Specification (V5)
6 pages
Variant Tables With Database Link Do Not Support The Shorter Material Lengths in Compatibility Mode
No ratings yet
Variant Tables With Database Link Do Not Support The Shorter Material Lengths in Compatibility Mode
2 pages
Metrohm Touch Control 808_809
No ratings yet
Metrohm Touch Control 808_809
274 pages
HFOOAD Chapter 5 Interlude: OO Catastrophe!
No ratings yet
HFOOAD Chapter 5 Interlude: OO Catastrophe!
13 pages
Internet of ThingsProtocols
No ratings yet
Internet of ThingsProtocols
10 pages
Amdahls Law - Advanced Computer Architecture
No ratings yet
Amdahls Law - Advanced Computer Architecture
2 pages
Algorithm Design
No ratings yet
Algorithm Design
3 pages
2-Door Intelligent Controller: System Overview
No ratings yet
2-Door Intelligent Controller: System Overview
2 pages
Backpatching in Compiler Design
No ratings yet
Backpatching in Compiler Design
4 pages
Answer Week003 Labex1
No ratings yet
Answer Week003 Labex1
9 pages
Python Programming: An Introduction To Computer Science: Objects and Graphics
No ratings yet
Python Programming: An Introduction To Computer Science: Objects and Graphics
56 pages
Full Stack Developer Job
No ratings yet
Full Stack Developer Job
2 pages
Generating Operating System Tick Using RTI On A Hercules ARM Safety Microcontroller
No ratings yet
Generating Operating System Tick Using RTI On A Hercules ARM Safety Microcontroller
11 pages
Beginning SAP Fiori
No ratings yet
Beginning SAP Fiori
3 pages
Airbus A320 ATA 31 EIS Presentation PDF
100% (1)
Airbus A320 ATA 31 EIS Presentation PDF
101 pages