Lec1 Introduction ECE618
Lec1 Introduction ECE618
Course Resources
Course Policy
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 2 | George Mason University
Course Information
Instructor Dr. Weiwen Jiang
E-Mail [email protected]
Phone (703)993-5083
Lecture Time Monday 19:20 - 22:00
Location Room 1002, Music/Theater Building
Office Hour Monday 16:30 - 17:30
Office Room 3247, Nguyen Engineering Building
Zoom http://go.gmu.edu/zoom4weiwen
Backup Course Zoom https://go.gmu.edu/ece618 (Need Permission First)
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 3 | George Mason University
About Me.
• Background
• Researcher at University of Pittsburgh (2017-2019)
• Postdoc at University of Notre Dame (2019-2021)
• George Mason University (2021 - present)
• Research Interests
• HW/SW Co-Design
Dr. Weiwen Jiang • Quantum Machine Learning
• Contacts:
• [email protected]
• Nguyen Engineering Building, Room3247
• (703)993-5083
• https://jqub.ece.gmu.edu/
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 4 | George Mason University
Teaching Assistant
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 5 | George Mason University
Course Description
Covers the hardware design principles to deploy different
machine learning algorithms. The emphasis is on understanding
the fundamentals of machine learning and hardware
architectures and determine plausible methods to bridge them.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 6 | George Mason University
Recommend Prerequisite
• ECE 554: Machine Learning for Embedded Systems
• Good C programming
• Especially required for FPGA-related project
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 7 | George Mason University
Agenda
Course Information
Course Resources
Course Policy
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 8 | George Mason University
Course Resources
• Blackboard:
• Assignments will be posted and submitted here!
• Online discussion, shared documents, announcements.
• Do NOT upload codes in discussion.
• Course Website:
• https://jqub.ece.gmu.edu/2022/01/01/HA4ML/
• Course information (TA time, location, zoom, etc.)
• Slides, readings, and documents will be posted here!
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 9 | George Mason University
Agenda
Course Information
Course Resources
Course Policy
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 10 | George Mason University
Grading Policy
● Midterm Exam 10%
● Final Exam 20%
● Research Paper Presentation 20%
● Assignments and Labs 20%
● Project 30%
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 11 | George Mason University
You Have Been Warned.
Zero Tolerance!
• No matter vaccinated or not, face mask is required
in class
• No plagiarism!
The most common sense of way interpreting no plagiarism:
You need to DO your work.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 13 | George Mason University
Agenda
Course Information
Course Resources
Course Policy
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 14 | George Mason University
Tools for lab
Google Colab
IBM Qiskit
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 15 | George Mason University
Agenda
Course Information
Course Resources
Course Policy
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 16 | George Mason University
What Software to Be Accelerated? --- MLP/CNN
Supervised Learning
Example: Classification
Training
𝑥𝑖 = 𝑦𝑖 = 3
Inference/Execution
Given: Unseen data test dataset
A learned function 𝒇
Do: 𝒇( )=3
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 17 | George Mason University
What Software to Be Accelerated? --- MLP/CNN
𝑥𝑖 = 𝑦𝑖 =“can I”
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 19 | George Mason University
What Software to Be Accelerated? --- RNN
Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 22 | George Mason University
What Hardware Will Be Covered in This Course?
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 23 | George Mason University
What Hardware Will Be Covered in This Course?
Quantum computing is a type of computation that harnesses the
collective properties of quantum states, such as superposition,
interference, and entanglement, to perform calculations.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 24 | George Mason University
Why Need Specialized Hardware Accelerators?
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Image credit]:25Prof. ZhiruMason
| George Zhang @ Cornell
University
Why Need Specialized Hardware Accelerators?
28%
24%
70%
42%
6%
https://openai.com/blog/ai-and-compute/
[Bianco, IEEE Access 2018]
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
27 |Hao @ GATech
George Mason University
Why Need Specialized Hardware Accelerators?
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU [Images credit]: Prof. Callie
28 | Hao @ Mason
George GATech
University
An Overview of Hardware Accelerators
Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)
ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 29 | George Mason University
Schedule
Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)
ODROID-XU4 Single Board Computer with NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC
Jan. 24 Course Information & Machine Learning and FPGA Accelerator Recap
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 31 | George Mason University
Schedule
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 32 | George Mason University
Expectation & Final Project
• Implement ML on any hardware in a team with 1-3 students
Intel's 12th Gen “Alder Lake” 10nm NVIDIA RTX A6000 Workstation
Desktop CPU Graphics Card (in my lab)
PYNQ ZCU Series (102, 104, 106)
ODROID-XU4 Single Board Computer NVIDIA Jetson Nano Xilinx Alveo U280 Data Center
with Quad Core 2GHz A15, 2GB RAM Accelerator Card ASIC
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 33 | George Mason University
What Did We Learn in ECE 554? (Recap)
Hardware
Power Mapping/Scheduling
Latency Communication
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 34 | George Mason University
ECE 554 Course Recap
• Machine Learning Basis:
▪ Different neural networks: MLP, CNN, RNN, RL
▪ Training (Gradient Descent) and inferencing neural networks using Pytorch
▪ Implement convolution using “for loops”
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 35 | George Mason University
Biological
Neuron
Human intelligence reside
in the brain:
• Approximately 86 billion neurons in the human brain
• The brain is a network of neurons, connected with nearly 1014 − 1015 synapses
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 36 | George Mason University
Biological
Neuron
Neurons work together:
• Cell body process the information
• Dendrites receive messages from other neurons
• Axon transmit the output to many smaller branches
• Synapses are the contact points between axon (Neuron 1) and dendrites (Neuron 2) for
message passing
Cell body receives input signal from dendrites and produce output signal
along axon, which interact with the next neurons via synaptic weights
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 38 | George Mason University
McCulloch-Pitts (MP) Neuron
The first computational model of a biological neuron @ 1943
Synapse Signal Direction Assumptions:
𝑥0 Dendrite • Binary devices (i.e.,
+1
𝑥𝑖 ∈ 0,1 and 𝑦 ∈ 0,1 )
Cell Body
Synapse • Identical synaptic weights
Axon
Warren McCulloch 𝑥1 𝒈𝒇 (i.e., +1)
Dendrite
+1 𝑦=𝑓 𝑔 𝒙 • Activation function 𝒇 has a
fixed threshold 𝜽
Synapse Dendrite
y
𝑥2 1
+1
Walter Pitts
0 𝜽
ML for Embedded Systems (Fall 2021) Dr. Weiwen Jiang, ECE, GMU 39 | George Mason University
Artificial Neuron Design
◼ Idealized neuron models
◼ Idealization removes complicated details that are not essential for
understanding the main principles.
◼ It allows us to apply mathematics and to make analogies.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 41 | George Mason University
Deep Convolutional Neural Networks (CNN) – Lecture 3
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 42 | George Mason University
Convolution – Lecture 3
Parameters:
• N: input channels
CLASS
• M: output channels
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
• K: kernel size
• P: padding size dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype
=None)
• S: stride
• D: dilation
• R: rows
• C: columns
[ref] Aqeel Anwar, What is Transposed Convolutional Layer? https://towardsdatascience.com/what-is-transposed-convolutional-layer-40e5e6e31c11
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 43 | George Mason University
From Static Image to Sequences of Data
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 44 | George Mason University
RNN and Feedforward Network – Lecture 5
w1 w2
time=3 w4
w1 w3
w2
w3 w4
time=2 w3 w4
w1 w2
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 47 | George Mason University
High-Level Synthesis: HLS – Lecture 8
• High-Level Synthesis
‒ Creates an RTL implementation from
C, C++, System C, OpenCL API C
kernel code
‒ Extracts control and dataflow from
the source code
‒ Implements the design based on
defaults and user applied directives
• Many implementation are possible from
the same source description
‒ Smaller designs, faster designs,
optimal designs
‒ Enables design exploration
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 48 | George Mason University
C Validation and RTL Verification – Lecture 8
• There are two steps to verifying the design
– Pre-synthesis: C Validation
• Validate the algorithm is correct
– Post-synthesis: RTL Verification
• Verify the RTL is correct
• C validation
– A HUGE reason users want to use HLS Validate C
• Fast, free verification
− Validate the algorithm is correct before synthesis
• Follow the test bench tips given over
• RTL Verification
• Vivado HLS can co-simulate the RTL with the Verify RTL
© Copyright
ECE618 2016
HW Accelerators for Xilinx
ML Dr. Weiwen Jiang, ECE, GMU 49 | George Mason University
AXI_Stream – Lecture 13
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 50 | George Mason University
Test Bench – Lecture 13
https://colab.research.google.com/drive/1TufHcDN
Mftm3bwAfcKEF5Njev0y_v6rM#scrollTo=X3pBQ
myNW4rs
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 51 | George Mason University
Export RTL as IP Core – Lecture 13
Synthesis
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 52 | George Mason University
Import the IP into Block Design – Lecture 13
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 53 | George Mason University
Goal: Enable AI for Everyone – Lecture 14
ProxylessNAS
FBNet
Level 1: Automation of
Neural Network Design
FNAS
Level 2: Automation of
AI System Design
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 54 | George Mason University
One Network Cannot Work for All Platforms – Lecture 14
◆ Cloud / Server
• Unlimited Resource
ResNet
• Maximizing Accuracy AlexNet
◆ Mobile Phones
• Fixed Hardware MnasNet
FBNet
• Accuracy v.s. Latency
ProxylessNAS
• MnasNet, ProxylessNAS, …
SkyNet
◆ Hardware Accelerators (e.g., FPGA)
EDDNet
• Hardware Design Flexibility FNAS
• Accuracy, Latency, and Energy
• FNAS, SkyNet, EDDNet…
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 55 | George Mason University 55
Datasets/Applications, Hardware, and Neural Networks – Lecture 14
Datasets / Applications Neural Networks
Hardware Platforms
FPGA
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 56 | George Mason University
Manual Design: Expensive, But Build the Road – Lecture 14
Problem
• Low Efficiency, hundreds even
thousands of GPU hours
• Mono-Objective: Accuracy,
leading network too complicated
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 58 | George Mason University 58
AutoML: Differentiable Architecture Search – Lecture 14
Name Time
DARTS Jun. 2018
RL NAS
Search Trained
Super Net Sub-Net
Space Super Net
Automatic NAS
Manual AI Design
DARTS
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 59 | George Mason University 59
AutoML: Hardware-Aware NAS – Lecture 14
Name Time
MnasNet Jul. 2018
ProxylessNAS Dec. 2018
FBNet Dec. 2018
Latency-Aware NAS
Automatic NAS
Manual AI Design
Latency-Aware NAS
Performance
Controller Trainer Predictor
Accuracy Latency
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 60 | George Mason University 60
AutoML: Network-FPGA Co-Design Using NAS – Lecture 14
Name Time
FNAS (ours) Jan. 2019
DNN/FPGA Apr. 2019
SkyNet Sep. 2019
EDDNet May. 2020 Latency-Aware NAS
Automatic NAS
Manual AI Design
• Optimization Approaches
• Deep Reinforcement Learning: RNN based controller
• Gradient Descent: DARTS
• Metaheuristics: Swarm
• Optimization Objective(s):
• Software: Accuracy, Robustness, Fairness, etc.
• Hardware: Latency, Chip Area, Energy Efficiency, etc.
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 62 | George Mason University
Programming Platform
https://colab.research.google.com/
ECE618 HW Accelerators for ML Dr. Weiwen Jiang, ECE, GMU 63 | George Mason University
George Mason University
4400 University Drive
Fairfax, Virginia 22030
Tel: (703)993-1000