0% found this document useful (0 votes)

8 views91 pages

30006

The document discusses AI accelerators, focusing on their architecture, challenges, and applications in smartphones and autonomous vehicles. It highlights the importance of hardware accelerators in enhancing the performance of AI algorithms, particularly in deep learning and inference tasks. The tutorial presented at the 2021 IEEE International Symposium on Circuits and Systems covers various aspects of AI hardware, including design approaches, performance metrics, and the growing demand for efficient computing solutions.

Uploaded by

CDKT Cao Thang BM DTCN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views91 pages

30006

Uploaded by

CDKT Cao Thang BM DTCN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Artificial Intelligent & Deep Learning

Hardware Accelerators for

Smart Technology and Intelligent Society
Shiho Kim, Ashutosh Mishra,
Seamless Transportation Lab, Yonsei University
Hyunbin Park, Samsung Electronics

2021 IEEE International Symposium on Circuits and Systems

May 22-28, 2021 Virtual & Hybrid Conference

Seamless Transportation Lab

Outline of Contents 2

1. AI Accelerators (Training & Inference)

• Architecture, Challenging Issues of AI Accelerators for Training & Inference

2. On-device AI of Smartphone
• On-device AI for Smartphone: Hardware, Software, Model Optimization and Benchmarking

3. AI Accelerators for Autonomous Vehicles

• Overview, Survey and challenges, HW AI Accelerators for Autonomous vehicles
Speakers 3

7.1 AI Accelerators (Training & Inference)

Dr. Ashutosh Mishra has received his Ph.D. degree in 2018 from the Department of Electronics Engineering, Indian
Institute of Technology (BHU) Varanasi, India. He has worked as Assistant professor in Electronics and Communication
Engineering at National Institute of Technology Raipur, India. He is recipient of the Korea Research Fellowship (KRF) 2019
provided by National Research Foundation of Korea through the Ministry of Science & ICT, South Korea. Currently, he is
working as Research faculty in Seamless Transportation Lab Yonsei University, South Korea. His research interests include
Smart sensors, Intelligent systems, Autonomous vehicles, and Artificial Intelligence, etc.

7.2 On-device AI of Smartphone

Dr. Hyunbin Park has received his Ph.D. from School of Integrated Technology, Yonsei University, South Korea, in 2019.
Currently, he is working as Staff Engineer in Samsung Electronics, South Korea. He is expertise in design of Inference
accelerator for deep neural networks, Deep learning processor inside camera, etc. His research interests are in NPU designs
for smartphones, Autonomous vehicle processor, Artificial Intelligent & Deep Learning based Hardware Accelerators etc.

7.3 AI Accelerators for Autonomous Vehicles

Dr. Shiho Kim is a Professor in the School of Integrated Technology, Yonsei University, South Korea. He has been
directing Seamless Transportation Lab, since 2011. His main research interests include the development of software and
hardware technologies for intelligent vehicles, and reinforcement learning for autonomous vehicles. He is a member of the
editorial board and reviewer for various Journals and International conferences. So far he has organized two International
Conference as Technical Chair/General Chair.
ISCAS 2021 Tutorial 7-1
AI Accelerators (Training & Inference)

Ashutosh Mishra, Yonsei University

Shiho Kim, Yonsei University
Hyunbin Park, Samsung Electronics

2021 IEEE International Symposium on Circuits and Systems

May 22-28, 2021 Virtual & Hybrid Conference

Seamless Transportation Lab

Outline of Tutorial Session 7-1 5

 Artificial Intelligence
• AI Algorithms & Hardware Requirements

 AI Hardware Accelerators
• Overview & Opportunities

 State-of-The-Art AI accelerators
• Recent AI Hardware Accelerators

 Summary
6
Artificial Intelligence (AI)
7
Artificial Intelligence (AI)
8
AI (Training and Inference)
9
AI (Training and Inference)
10
AI (Training and Inference)
11
AI (Inference)

SOURCE: Xilinx Adaptive Compute Acceleration Platform

Key Requirements of AI Inference Acceleration

 Lowest latency
 Accelerate whole application
 Match the speed of AI innovation
12
Software & Hardware Options of ML inference System

Neural Network Open Neural Net Graph Format

Exchange Format work Exchange (NNEF/ ONNX)
My ML Fra ML Frame
mework work
13
AI Algorithms (Deep Learning)

Task Suitable Models

Image classification CNN

Image recognition CNN

Time series prediction RNN, LSTM

Text generation RNN, LSTM

Visualization SOM
14
AI Algorithms (Deep Learning)
Neural Networks

Convolutional Neural Networks (CNNs)

15
AI Algorithms (Deep Learning)

h=height of input s=stride (pixel skipping)

w=width of input p=width of output
d=depth of input q=height of output
m=l=filter-size k=depth of output =no. of channels in filter/ kernel

𝐼𝑛𝑝𝑢𝑡 𝑤𝑖𝑑𝑡ℎ + 2 × 𝑝𝑎𝑑𝑑𝑖𝑛𝑔 − 𝑓𝑖𝑙𝑡𝑒𝑟 𝑤𝑖𝑑𝑡ℎ

𝑂𝑢𝑡𝑝𝑢𝑡 𝑤𝑖𝑑𝑡ℎ = +1
𝑠𝑡𝑟𝑖𝑑𝑒
𝐼𝑛𝑝𝑢𝑡 ℎ𝑒𝑖𝑔ℎ𝑡 + 2 × 𝑝𝑎𝑑𝑑𝑖𝑛𝑔 − 𝑓𝑖𝑙𝑡𝑒𝑟 ℎ𝑒𝑖𝑔ℎ𝑡
𝑂𝑢𝑡𝑝𝑢𝑡 ℎ𝑒𝑖𝑔ℎ𝑡 = +1
𝑠𝑡𝑟𝑖𝑑𝑒
16
AI Algorithms (Deep Learning)
17
AI Algorithms (Deep Learning)
MAC operations in inference and training of a convolutional layer

Inter Layer Parallelism

SOURCE: [1]. Choi, S., Sim, J., Kang, M., Choi, Y., Kim, H. and Kim, L.S., 2020. An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices. IEEE Journal of Solid-State Circuits, 55(10), pp.2691-2702.
[2]. Song, L., Qian, X., Li, H. and Chen, Y., 2017, February. Pipelayer: A pipelined reram-based accelerator for deep learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 541-552). IEEE.
18
AI Algorithms (Timeline for Computer Vison Models)
Inception-v4
Inception-v1

AlexNet ResNet-50 2016 ResNeXt-50

2014

2017
2012 2015

2015
1998 2016

2014 Inception-v3 2016

LeNet-5 Inception
ResNets
VGG-16 Xception
19
AI Algorithms (Deep Learning Models)
20
AI Algorithms

Accuracy • For example; if actual image is blueberry

Model Size (MB) Parameters Depth
Top-1 Top-1 and predictions are (with probability)
VGG16 528 0.713 0.901 138357544 23
Inception-V3 92 0.779 0.937 23851784 159
1. cherry: 0.35
ResNet50 98 0.749 0.921 25636712
Xception 88 0.790 0.945 22910480 126
2. raspberry: 0.25
InceptionResNetV2 215 0.803 0.953 55873736 572 3. blueberry: 0.2
ResNeXt50 96 0.777 0.938 25097128 4. strawberry: 0.1
•
•
Here, Top-1 & Top-5 accuracy indicates the model performance on ImageNet validation dataset.
Depth indicates the topological depth of the network. It includes the Convolutional layers, pooling
5. apple: 0.06
layers, activation layers, batch normalization layers, etc.
6. orange: 0.04

• Top-1 accuracy is the conventional version of accuracy. • According to Top-1 accuracy prediction
It consider 1 class with the highest probability. (cherry: 0.35) is wrong.

• Top-5 accuracy use top-5 classes instead of 1 class. • According to Top-5 accuracy the
prediction is correct since blueberry is still
on top-5 results.
21
AI Accelerators (Hardware Accelerators)

• Hardware Acceleration — It is the use of computer hardware specially made to perform some
functions more efficiently than is possible in software running on a general-purpose
processing unit alone.

• It combines the flexibility of general-purpose processors, such as CPUs, with the fully
customized hardware, such as GPUs and ASICs, to increase the efficiency by orders of
magnitude. For example, visualization processes may be offloaded onto a graphics card in order
to enable faster, higher-quality playback of videos and games.

• An AI accelerator is a class of specialized computer system designed to accelerate artificial

intelligence applications, especially artificial neural networks, machine vision and machine
learning.
22
AI Accelerators (Hardware Accelerators)

• Emerging Deep Neural Network (DNN)

algorithms have made AI-assisted systems
to witness the tremendous growth in recent
years in accomplishing a variety of cognitive
tasks.
• However, the algorithmic superiority of
DNNs require extremely high computation
and memory costs that pose significant
challenges to the hardware platforms
executing them. Therefore, AI accelerators
are gaining attention by the designers and
academicians in circuits and systems.
• Recently, the approximate computing, in-
memory computing, machine intelligence
and quantum computing are among the
SOTA computing approaches being explored
for AI workloads.
23
Artificial Intelligence & Hardware Requirements

• Size of AI models is increasing exponentially.

• Therefore, the floating point operations per

second (FLOPS) required is doubling roughly
every 3.5 months, creating an insatiable
demand for ever more performance.

• Therefore, over 100 companies developing

new architectures to bring the performance SOURCE: Intel

up and the cost of computing down.

• Analysis show that 2X advances in

architecture, silicon, interconnect, software
and packaging is required to match the
FLOPs requirements (~10X) every year.

SOURCE: Moor Insights and Strategy

24
AI Accelerators (Hardware Accelerators)

• General purpose (GP) hardware (HW) uses arithmetic blocks for basic in-memory calculations
(i.e., serial processing) which is not suitable for high performance deep learning techniques.

 Neural networks need multiple parallel and simple arithmetic operations.

 Even the powerful GP chips can not support a high number of simultaneous operations.
 AI optimized HW includes numerous less powerful chips which enables parallel processing.

• The AI accelerators provide following advantages over the GP hardware:

 Faster computation: Artificial intelligence applications typically require parallel

computational capabilities in order to run sophisticated training models and algorithms. AI
hardware provides more parallel processing capability that is estimated to have up to 10
times more competing power in ANN applications compared to traditional semiconductor
devices at similar price.

 High bandwidth memory: Specialized AI hardware is estimated to allocate 4-5 times more
bandwidth than traditional chips. This is necessary in parallel processing. AI applications
require significantly more bandwidth between processors for efficient performance.
25
AI Accelerators Assessment Parameters
Processing speed Power requirements Device Size Total Cost

Processing Speed: AI hardware enables faster training and inference using neural networks.
• Faster training enables the ML experts to try different DL approaches.
• Optimize the structure of their neural network (hyper parameter optimization).
• Faster inferences (e.g. predictions) are critical for applications like autonomous driving.

Power Requirements: Lesser power consumption increases the device ON-time.

Device Size: The device size is very important in IoT applications, mobile phones, or small devices.

Total Cost: The cost of the device is extremely crucial for any procurement decision.
26
AI Accelerators Assessment Parameters

Another important criteria in assessing AI Hardware is: Platforms

It is challenging as the chip needs to be supported by the hardware and software for developers to
build applications on them.

 Standalone platform

• Personal Computer
• On-board Devices
• Mobile Devices

 Sever-based platform

 Cloud-based platform
27
Performance Metrics for AI Accelerators

Instructions Per Second (IPS): It is a measure of a processor speed in instructions/cycle or

instructions/second (e.g., kilo instructions per second (KIPS), million instructions per second (MIPS),
billion instructions per second (GIPS), etc.).
• However, the instructions/cycle measurement depends on the instruction sequence, the data
and external factors.

Floating Point Operations Per Second (FLOPS, flops or flop/s): It is a measure of computer
performance in floating-point calculations.
• Therefore, it is a more accurate measure than measuring instructions per second.

Trillions Operations Per Second (TOPS): It is a measure of the maximum achievable throughput
but not a measure of actual throughput.
• Most operations are Multiply and Accumulate (MAC).
• Therefore, TOPS ="(number of MAC units)"×"(frequency of MAC operations)"×"2“
• Even TOPS is not the enough information for performance.
28
Performance Metrics for AI Accelerators

• We should know the throughput of our model.

• Further, the Throughput/$ is the inference efficiency for a given model.
• All inference accelerators have four key components:
MACs; SRAM; DRAM; Interconnect architecture.
• Interconnect architecture connects the compute and memory blocks along with logic
that controls the execution of the neural network model.
• However, more of MACs, SRAM, DRAM and interconnect improve throughput as well as
the cost.
• Therefore, to get maximum inference efficiency we should maximize throughput (for a
given model, image size, batch size) with the least MACs, SRAM, DRAM and
interconnects, which eventually maximize the Throughput/$.
• Note that roughly $ and power are correlated (as the power dissipation comes from
MACs, SRAM, DRAM and interconnect – more of each will translate to more power).
29
AI Accelerators Design Approaches

Scalar processing elements (Central Processing Units (CPU))

AMD Ryzen 9 5900X, AMD Ryzen 9 3950X, Intel® Core™ i5-10600k, 11th Gen Intel® Core™ vPro, etc.

General Purpose Processors

Vector processing elements (Graphic Processing Units (GPU))
AI Accelerator Architectures

Nvidia GeForce GeForce RTX 3070, RTX 3080, AMD Radeon RX 6900 XT, RX 6700 XT, etc.

Programmable logic (Field-Programmable Gate Arrays (FPGA))

Intel® Stratix® 10 GX, SX, TX, etc.

Fixed logic (Application-specific Integrated Circuits (ASIC))

Google TPU, Gaudi Habana, Intel Nervana, etc.

Special Purpose Processors

Mobile AI/ Neural Processing Units (NPU)

Exynos 2100, Qualcomm® Hexagon™ 780, MAX78000, etc.

Emerging Technologies

Processing in-memory (PIM), Neuromorphic Computing, Quantum Computing, AI-Wafer Chips, Analog Emerging Technologies
memory-based technologies, etc.
30
AI Accelerators Design Approaches

• Central Processing Units (CPU): It is the general purpose processors mostly used in standalone
personal computes (Intel Core, AMD Ryzen etc.)

• Graphical Processing Units (GPUs): They were originally designed for accelerating graphical
processing through parallel computing. The same approach is effective to train the complex deep
learning algorithms.

• Wafer Chips: To increase the package density, a silicon wafer containing trillions of transistors on a
single chip (e.g., Cerebras). It has ~72 square inch silicon wafer size containing 1.2 trillion transistors
on it. Therefore, it can support ~ 400 thousands of processing cores on it.

• Neural Processing Unit (NPU): The architecture provides parallel computing and pooling to increase
overall performance. It is specialized in Convolution Neural Network (CNN) applications. The
architecture can be reconfigured to switch between models in real-time. It allows creating an
optimized hardware depending on the needs of the application.
31
AI Accelerators Design Approaches

• Neuromorphic Architectures: These are an attempt to mimic brain cells using novel
approaches from adjacent fields such as materials science and neuroscience. These chips can
have an advantage in terms of speed and efficiency on training neural networks.

• Analog Memory-based Technologies: Digital systems built on 0’s and 1’s dominate today’s
computing world. However, analog techniques contain signals that are constantly variable and
have no specific ranges. IBM research team demonstrated that large arrays of analog memory
devices achieve similar levels of accuracy as GPUs in deep learning applications.

• Tensor Processing Unit (TPU): It is an application-specific integrated circuit (ASIC) based AI

accelerator developed by Google. Cloud TPU enables to run the machine learning workloads
on Google’s TPU accelerator hardware using TensorFlow.
32
Processor Designs
2 Integer operations
in same clock cycle 2 Instructions streams
= more parallelism

OoO (Out of Order)

FPU = Floating Point Unit

A very simple processor ALU = Arithmetic & Logic Unit
Only one Integer operations in one clock cycle SMT (Simultaneous Multi-Threading) Vector Processing
(Integer Processor Unit) SIMD (single instruction multiple data)
Latency when one of the SMT Threads is blocked
One ALU is saturated & other is used <1% Same fundamental limitation

SIMD/Vector Instructions

SIMT (Single-Instruction Multiple Threads)

OoO will simply pause the thread execution during any blockage AMD GCN (4x SIMD + 1 ALU)
Worse idea (Warp Scheduler is from NVidia)
SIMT+SIMD is how a modern GPU works
SOURCE: https://medium.com/@valarauca/wtf-is-a-simd-smt-simt-f9fb749f89f1
33
AI Accelerators Design Approaches

• The temporal architectures appear mostly in CPUs or Generic DNN Accelerator

GPUs and employ a variety of techniques to improve Architecture
parallelisms, such as vectors (SIMD) or parallel
threads (SIMT).
• Such temporal architectures use a centralized control
for a large number of arithmetic logic units (ALUs).
These ALUs can only fetch data from the memory
hierarchy and cannot communicate directly with each
other.
• The spatial architectures use dataflow processing; i.e.,
the ALUs form a processing chain so that they can
pass data from one to another directly.
• ALUs have its own control logic and local memory
called a scratchpad or register file. The ALU with its
own local memory as a processing engine (PE).
• Spatial architectures are commonly used for deep
neural networks in ASIC and FPGA-based designs.

SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
34
AI Accelerators Design Approaches

Generic dual core processor Intel dual-core dual-processor system

Multicore processors
SOURCE: Intel

• A multicore processor is a single integrated circuit with two or more separate processing
units, called cores, each of which reads and executes program instructions.

• The instructions are ordinary CPU instructions (such as add, move data, and branch) but the
single processor can run instructions on separate cores at the same time, increasing overall
speed for programs that support multithreading or other parallel computing techniques.
35
AI Accelerators Design Approaches

• GPU is a specialized processor originally designed to

accelerate graphics rendering.

• GPUs can process many pieces of data simultaneously,

making them useful for machine learning, video editing,
and gaming applications, etc.

SOURCE: https://steemit.com/gridcoin/@dutch/hardware-and-project-selection-part-1-cpu-vs-gpu

A generic modern GPU architecture

SOURCE: https://www.nextplatform.com/2019/07/10/a-decade-of-accelerated-computing-augurs-well-for-gpus/

SOURCE: Aamodt, T.M., Fung, W.W.L. and Rogers, T.G., 2018. General-purpose graphics
processor architectures. Synthesis Lectures on Computer Architecture, 13(2), pp.1-140.
36
AI Accelerators Design Approaches

Comparison of the number of CPU cores and GPUs

SOURCE: Capra, M., Bussolino, B., Marchisio, A., Shafique, M., Masera, G. and Martina, M., 2020. An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks. Future Internet, 12(7), p.113.
37
AI Accelerators Design Approaches

N Multiprocessors called (SMs)

Each has M cores called (SPs)
SM = Streaming Multiprocessors
SP = Streaming Processor (core)

MP = Multiprocessor; SM = Shared Memory; SFU = Special Functions Unit;

SOURCE: NVIDIA CUDA™
IU = Instruction Unit; SP = Streaming processor (core/ CUDA core) NVIDIA GPU Architecture
38
AI Accelerators Design Approaches

SOURCE: https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664
39
AI Accelerators Design Approaches

• GPUs are the current workhorses for DNNs’ inference and especially training.

• Limitations:

 Bandwidth,
 Latency, and
 Branch prediction.
For a chaotic code flow the GPU becomes even slower than the CPU.

• It needs a CPU to be controlled.

(latest GPUs are more independent and can give themselves new tasks after first task given by CPU).
40
AI Accelerators Design Approaches
• Field-Programmable-Gate-Arrays (FPGAs) and Application-Specific-Integrated-Circuit (ASICs)
belongs to the Spatial Architectures.

• Primary purpose of FPGAs is programmability to implement any possible design.

• They are relatively cost-effective with short time-to-market, and the design flow is simple.

• However, FPGAs can not be optimized for the various requirements of different
applications, are less energy-efficient, and have lower performances than ASICs.

• On the contrary, ASICs need to be designed and produced for a specific application that
cannot be changed over time.

• The design flow is consequently more complex, and the production cost is higher, but the
resulting chip is highly-optimized and energy-efficient.
SOURCE: Mao, W., Xiao, Z., Xu, P., Ren, H., Liu, D., Zhao, S., An, F. and Yu, H., 2020, September. Energy-Efficient Machine Learning Accelerator for Binary Neural Networks. In Proceedings of the 2020 on Great Lakes Symposium on VLSI (pp. 77-82).
41
AI Accelerators Design Approaches
AI Hardware accelerator for DNNs
A general FPGA architecture
(implemented on ASIC or FPGA)

Processing Elements

SOURCE: Capra, M., Bussolino, B., Marchisio, A., Shafique, M., Masera, G. and Martina, M., 2020.
An updated survey of efficient hardware architectures for accelerating deep convolutional neural
networks. Future Internet, 12(7), p.113.

SOURCE: Skliarova, I. and Sklyarov, V., 2019. FPGA-based Hardware Accelerators. Springer International Publishing.

• Challenges in ASIC & FPGA Accelerators are:

 Significant amount of storage, External memory bandwidth, and
computational resources on the order of billions of operations per second.
42
AI Accelerators Design Approaches
Architecture of the FPGA-based accelerator

A general strategy for the design, implementation

& test of FPGA hardware accelerators

SOURCE: Skliarova, I. and Sklyarov, V., 2019. FPGA-based Hardware Accelerators. Springer International Publishing.

SOURCE: Mao, W., Xiao, Z., Xu, P., Ren, H., Liu, D., Zhao, S., An, F. and Yu, H., 2020, September. Energy-Efficient Machine Learning
Accelerator for Binary Neural Networks. In Proceedings of the 2020 on Great Lakes Symposium on VLSI (pp. 77-82).
43
AI Accelerators Design Approaches

The ASIC-based Accelerator Architecture

SOURCE: Mao, W., Xiao, Z., Xu, P., Ren, H., Liu, D., Zhao, S., An, F. and Yu, H., 2020, September. Energy-Efficient Machine Learning Accelerator for Binary Neural Networks. In Proceedings of the 2020 on Great Lakes Symposium on VLSI (pp. 77-82).
44
AI Accelerators Design Approaches
Architecture of the Google Tensor Processing Units (TPUs)

TPU Pods = Clusters of TPU

HBM = High-bandwidth Memory

MXU = Matrix Unit

TPU v2: • TPU v3:

• 8 GB of HBM for each TPU core • 16 GB of HBM for each TPU core
• One MXU for each TPU core • Two MXUs for each TPU core
• Up to 512 total TPU cores and 4 TB of total • Up to 2048 total TPU cores and 32 TB of total
memory in a TPU Pod memory in a TPU Pod
SOURCE: https://cloud.google.com/tpu/docs/system-architecture#device.
45
AI Accelerators Design Approaches

Edge Acceleration Platform Architecture

SOURCE: Karras, K., Pallis, E., Mastorakis, G. et al. A Hardware Acceleration Platform for AI-Based Inference at the Edge.
Circuits Syst Signal Process 39, 1059–1070 (2020). https://doi.org/10.1007/s00034-019-01226-7
46
AI Accelerators Design Approaches

Cloud Edge and Mobile Based Hardware Accelerators

47
AI Accelerators Design Approaches

Cloud and Edge Based AI Computing

48
AI Accelerators Design Approaches

Cloud and Edge Based AI Computing

• Nvidia (GPU)
• Google(TPU)
• Microsoft (BrainWave)
• Amazon (Inferentia)
• Facebook
• Alibaba, Baidu
49
AI Accelerators Design Approaches

Mobile/Edge based AI Inference

50
AI Accelerators Design Approaches

Mobile/Edge DNN Applications

51
AI Accelerators Design Approaches

Cloud vs Edge Summary

52
AI Accelerators Design Approaches

Multiply and Accumulate (MAC) Architecture

Conventional Systolic-Array Neural Computing Unit

SOURCE: Cho, K., Lee, I., Lim, H. and Kang, S., 2020. Efficient systolic-array redundancy architecture for offline/online repair. Electronics, 9(2), p.338.
53
AI Accelerators Design Approaches
Systolic Array-based DNN Accelerator

SOURCE: Zhang, J., Rangineni, K., Ghodsi, Z. and Garg, S., 2018, June. Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference (pp. 1-6).
54
AI Accelerators Design Approaches

Tensor Processing Unit (TPU) Block Diagram

The TPU includes the following computational

resources:

• Matrix Multiplier Unit (MXU): 65,536 8-bit

multiply-and-add units for matrix operations.
• Unified Buffer (UB): 24MB of SRAM that
work as registers.
• Activation Unit (AU): Hardwired activation
functions.

SOURCE: Chen, Y., Xie, Y., Song, L., Chen, F. and Tang, T., 2020. A survey of accelerator architectures for deep neural networks. Engineering, 6(3), pp.264-274.
55
AI Accelerators Design Approaches
Von Neumann Bottleneck for AI Increasing Memory Bandwidth

• Von-Neumann architecture serially How can we increase bandwidth

fetches data from the storage. between processor and memory?
• AI application needs to access
tremendous amount of data.
SOURCE: http://ictconference.kr/2020ict/sub/pdf/006.pdf.
56
AI Accelerators Design Approaches

Near Memory Processing

High Bandwidth Memory (HBM)

3D Stacked Memory

SOURCE: http://ictconference.kr/2020ict/sub/pdf/006.pdf.
57
AI Accelerators Design Approaches
Advantage of High Bandwidth Memory

SOURCE: http://ictconference.kr/2020ict/sub/pdf/006.pdf.
58
AI Accelerators Design Approaches

Towards into Memory

Emerging Non-Volatile Memories

SOURCE: White Paper on AI Chip Technologies (2018).

SOURCE: http://ictconference.kr/2020ict/sub/pdf/006.pdf.
59
AI Accelerators Design Approaches
Processing-In-Memory (PIM)

• Von-Neumann architecture serially

Non Von Neuman
fetches data from the storage.
• Converged logic and memory (high BW).
• AI application needs to access
tremendous amount of data. • Suitable for data-intensive workloads.
• Little data movement (energy efficient).

SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
60
AI Accelerators Design Approaches
Processing-In-Memory (PIM)

• Processing-in-memory accelerators offer many potential benefits including:

• Reduced data movement of weights
• Higher memory bandwidth by reading multiple weights in parallel
• Higher throughput by performing multiple computations in parallel
• lower input activation delivery cost due to increased density of compute
• However, there are several key design challenges and decisions that need to be considered in
practice.
• Analog processing is typically required to bring the computation into the array of storage
elements or into its peripheral circuits.
• Therefore, major challenges for processing in memory (PIM) are its sensitivity to circuit and device
non-idealities (i.e., nonlinearity and process, voltage and temperature variations).

SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
61
AI Accelerators Design Approaches

Dataflow for PIM Accelerators

Word Line

Conventional Processing in memory

Bit Line

SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
62
AI Accelerators Design Approaches
MAC Operations in Resistive NVM Device MAC Operations in Floating Gate NVM Device

NVM = Non-volatile memory

I-V Curve of resistive NVM Device

I-V Curve of Floating Gate NVM Device

LRS = low resistive state (RON)

HRS = high resistive state (ROFF)

SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
63
AI Accelerators Design Approaches
Neural Processing Unit (NPU)
Single PE Eight-PE NPU

SOURCE: Chen, Y., Xie, Y., Song, L., Chen, F. and Tang, T., 2020. A survey of accelerator architectures for deep neural networks. Engineering, 6(3), pp.264-274.
64
AI Accelerators Design Approaches
Neuromorphic Chip

• Perceptron based • Non-linear activation functions • Spiking neuron

• No non-linear functions • Continuous output • Closely model biological neuron's
activity
• Binary output. • Functional modeling of our brain
• Incorporates concept of time:
• Working real-life applications
integrate and fire
• We are here (FF, CNN, RNN, …).
• Computationally expensive
• Difficult to train
SOURCE: Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S., 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture, 15(2), pp.1-341.
65
AI Accelerators Design Approaches
Neuromorphic Chip

Address-Event-Representation

SOURCE: Schuller, I.K., Stevens, R., Pino, R. and Pechan, M., 2015. Neuromorphic computing–from materials research to systems architecture roundtable. USDOE Office of Science (SC)(United States).
66
AI Accelerators Design Approaches
Neuromorphic Chip

Neuromorphic Chip with Emerging Device

SOURCE: M. Jerry., et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training.", IEEE IEDM 2017.
68
AI Accelerators Design Approaches

Neuromorphic Chip with Emerging Non-volatile RAM (ReRAM Memristor)

(b) Scanning Electron Micrograph (c) Transmission Electron Microscopy

of a single 1T1R cell. image of the drift memristor

Schematic illustration of a cross point diffusive memristor.

(a) Optical micrograph of the integrated memristive (d) Scanning Electron Micrograph of a (e) Transmission Electron Microscopy
neural network. single diffusive memristor junction. image of the diffused memristor.

SOURCE: M. Jerry., et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training.", IEEE IEDM 2017.
69
AI Accelerators Design Approaches
Domain-specific hardware accelerators
• Accelerators have been designed for various tasks such as: Graphics, Deep learning, Simulation,
Bioinformatics, Image processing, etc.
• A domain-specific accelerator is specialized for a particular domain of applications.

Accelerators exploit four main techniques for performance and efficiency gains:

Data specialization: Specialized operations on domain-specific data types can do in one cycle what
may take tens of cycles on a conventional computer.
Parallelism: High degrees of parallelism, often exploited at several levels, provide gains in performance.
Local and optimized memory: By storing key data structures in many small, local memories, very high
memory bandwidth can be achieved with low cost and energy.
Reduced overhead: Specializing hardware eliminates the overhead of program interpretation.

SOURCE: Dally, W.J., Turakhia, Y. and Han, S., 2020. Domain-specific hardware accelerators. Communications of the ACM, 63(7), pp.48-57.
70
AI Accelerators Design Approaches

Comparison of computation efficiency (in Tasks/s-Watt) for CPU, FPGA, GPU, and ASIC
for Deep learning and Genomics domains

SOURCE: Dally, W.J., Turakhia, Y. and Han, S., 2020. Domain-specific hardware accelerators. Communications of the ACM, 63(7), pp.48-57.
71
AI Accelerators Design Approaches

AI Emulators

• AI companies are looking to

develop their own silicon because: AI frameworks Benchmarks to Run in Emulation
• They face increasing demands for
performance.
• The market fragmentation
demands application-specific
algorithm-to-hardware solutions.
• They have pressure to reduce
power consumption even with the
increasing AI processing load.
• AI emulation is playing an
important role in enabling this
shift-into-silicon.
SOURCE: https://www.techdesignforums.com/practice/technique/emulation-for-ai-part-one/.

• AI-Based Emulators enable

enhances simulation speedups,
and accuracy.
72
AI Accelerators Design Approaches

A block diagram for a four-CPU board (Wave Computing)

• The DPUs are interconnected directly
with each other over a fabric (used for
signaling “Fire” and “Done”) and
through a dual ported Hybrid Memory
Cubes (HMC),
• HMC act both as fast memory and as
shared data buffers between the DPUs.
• This allows shared double buffering to
improve scalability by keeping the
critical data close to the processors.
• The bisection bandwidth of this
approach is impressive, which supports
[Wave’s] scale-up and scale-out thesis,
delivering up to 7.25TB/s.
Hybrid Memory Cubes (HMC).
• Wave shifted from an FPGA-based
development strategy to emulation –
because of the scalability required by
that silicon size.
SOURCE: https://www.techdesignforums.com/practice/technique/emulation-for-ai-part-one/.
73
AI Accelerators Design Approaches

Intel Open VINO

OpenVINO™ toolkit:
• Enables CNN-based deep learning
inference on the edge.
• Supports heterogeneous execution
across an Intel® (CPU, Integrated
Graphics, Neural Compute Stick and
Vision) Accelerator Design with • It speeds time-to-market via an easy-to-use library of
Intel® Movidius™ Vision processing computer vision functions and pre-optimized kernels.
unit (VPUs). • It includes optimized calls for computer vision standards,
including OpenCV* and OpenCL™.
SOURCE: https://docs.openvinotoolkit.org/latest/index.html.
74
Some Leading AI Hardware Accelerators
• There is tremendous pressure on dominate AI hardware industries in producing an
efficient hardware because of the technical complexity of AI algorithms.

• According to Forbes, even Intel with numerous world class engineers and a strong
research background, needed 3 years of work to build the Nervana neural network
processor.

• Below is the list of some leading companies working on AI hardware:

 Advanced Micro Devices(AMD)  IBM

 Apple  Intel
 Arm (Advanced RISC Machine)  Samsung
 Baidu  Nvidia
 Google (Alphabet)  Texas instruments
 Graphcore  Qualcomm
 Huawei  Xilinx
75
Landscape for AI Hardware

SOURCE: https://basicmi.github.io/AI-Chip/
76
AI Algorithms
• NASNet-A-Large has highest accuracy with high
er computational complexity.

• Top-1 accuracy is the conventional accuracy (i.e.,

the model answer the highest probability).

• NVIDIA Titan X Pascal GPU is used as workstatio

• For computational complexity less than 5 G-FLO

Ps, the SE-ResNeXt-50 (32 × 4𝑑) has the highest
accuracy.
The ball size corresponds to the model complexity

• SENet-154 needs ~ 3 times more operations as

that are needed by SE-ResNeXt-101 having almo
st the same accuracy.

• VGG-13 has a very high model complexity than

ResNet-18 (having almost the same accuracy).
Bianco, S., Cadene, R., Celona, L. and Napoletano, P., 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access, 6, pp.64270-64277.
77
AI Stack
78
State-of-The-Art AI Accelerators

AWS Inferentia

• Amazon has its own solutions for both training (AWS

Trainium) and inference (AWS Inferentia).
• Each AWS Inferentia chip contains four NeuronCores.
• Each NeuronCore implements a high-performance systolic
array matrix multiply engine.
• NeuronCores are also equipped with a large on-chip
SOURCE: https://aws.amazon.com/machine-learning/inferentia/. cache.
AWS Trainium • AWS Trainium is the second custom machine learning chip
designed by AWS.
• It is targeted for training models in the cloud.

SOURCE: https://aws.amazon.com/machine-learning/trainium/.
79
State-of-The-Art AI Accelerators
Cerebras Wafer Scale Engine (WSE)
• The Cerebras Systems (CS)-1 wafer is an
MIMD, distributed-memory machine
with a 2D-mesh interconnection fabric.
• The repeated element of the architecture
Data Structure
Register is called a tile.
• The tile contains one processor core, its
memory, and the router that it connects
Fused Multiply
-Accumulate

to.
• The routers link to the routers of the
four neighboring tiles.
SOURCE: arXiv:2010.03660v1.

• The wafer contains a 7×12 array of 84 identical “die.” A die holds thousands of tiles.
• It has 18 Gigabytes of On-chip Memory, all accessible within a single clock cycle, and provides 9 PB/s
memory bandwidth.
• It is a huge monster having ~1.2 trillion transistors (TSMC 16nm process)
(for comparison, NVIDIA’s A100 GPU contains 54 billion transistors).
80
State-of-The-Art AI Accelerators
Intel Nervana Neural Network Processor-T (NNP-T)
Intel NNP-T Block Diagram
Intel NNP-T Matrix Processing Units (MPU)
Intel NNP-T Tensor Processor Diagram

Intel NNP-T Floating Point Dot Product Design

SerDes = Serializer/ Deserializer; ICL = Inter-chip Link;
PCIe = Peripheral Component Interconnect express;
HBM = High Bandwidth Memory; MC = Memory Controller

• Intel’s Nervana NNP-T is a standalone PCIE-based accelerator

for deep learning and artificial intelligence training acceleration.
• It is fabricated on TSMC’s 16nm process, the chip utilizes a
single large 680mm2 die with over 27 billion transistors and
typical workload power ranging from 150-250W.
SOURCE: B. Hickmann, J. Chen, M. Rotzin, A. Yang, M. Urbanski and S. Avancha, "Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-
Term Dot Product," 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH), Portland, OR, USA, 2020, pp. 133-136, doi: 10.1109/ARITH48897.2020.00029.
81
State-of-The-Art AI Accelerators
GOYA Processor High-level Architecture

• Goya inference processor is based on the scalable

architecture of Habana’s Tensor-Processing Core (TPC).
• TPC core natively supports the following data types:
FP32, BF16, INT32, INT16, INT8, UINT32, UINT16 and
UINT8.
• It includes a cluster of eight programmable cores.
• TPC is Habana’s proprietary core designed to support
deep learning workloads.
• It is a very long instruction word (VLIW) single
instruction multiple data (SIMD) vector processor with
Instruction-Set-Architecture and hardware tailored to
serve deep learning workloads efficiently.

SOURCE: https://habana.ai/wp-content/uploads/pdf/2020/Habana%20GOYA%20Inference%20Performance%20Whitepaper%20Nov'20.pdf.
82
State-of-The-Art AI Accelerators
Gaudi Processor High-level Architecture

• Habana provides products both for training

(Gaudi) and inference (Goya).
General Matrix Multiply

• Gaudi is based on the scalable architecture of the

Tensor Processor Core (TPC™). It uses a cluster of
eight TPC 2.0 cores.
• Gaudi is the first AI Processor which integrates
on-chip RDMA* over Converged Ethernet (RoCE
v2) engines.
• These engines play a critical role in the inter-
processor communication needed during the
training process.
• SynapseAI® is Habana’s home-grown compiler
and runtime. Direct Memory Access

• It is built for seamless integration with existing

frameworks, that both define a Neural Network for
execution and manage the execution Runtime.

*Remote Direct Memory Access (RDMA)

SOURCE: https://habana.ai/wp-content/uploads/pdf/2020/Habana%20GAUDI%20Training%20Whitepaper%20v1.2.pdf.
83
State-of-The-Art AI Accelerators
Graphcore Intelligence Processing Unit (IPU)
(Colossus MK2 GC200 IPU)

• GC200 contains 59.4B transistors and is

built using the very latest TSMC 7nm
process.
• Each MK2 IPU has 1472 IPU-cores, running
8832 independent parallel program
threads.
• Each IPU holds 900MB In-Processor-
Memory with 47.5 TB/s bandwidth.
• It delivers up to 250 TFLOPS of AI compute
at FP16.16 and FP16.SR (stochastic
rounding).

SOURCE: https://www.graphcore.ai/products/ipu.
84
State-of-The-Art AI Accelerators
NVIDIA GA102 GPU with 84 SMs

• GA102 is composed of
Graphics Processing Clusters
(GPCs), Texture Processing
Clusters (TPCs), Streaming
Multiprocessors (SMs), Raster
Operators (ROPS), and
memory controllers.

• The full GA102 GPU contains

seven GPCs, 42 TPCs, and 84
SMs.

SOURCE: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.
85
State-of-The-Art AI Accelerators
Huawei Ascend
Atlas 300I Inference Card Ascend 910 AI Processor
(Model: 3000/3010)

• Huawei has its own solutions for both training (Ascend 910) and inference (Ascend 310).
• Atlas 300I inference card has Ascend 310 AI processor to unlocks superior AI inference performance.
• It delivers 22 TOPS@INT8 and 11 TFLOPS@FP16 with just 8 W of power consumption.
• Ascend 910 is a high-integration SoC AI processor for suitable for AI training.
• It delivers 320 TFLOPS@FP16 and 640 TOPS@INT8 of compute performance with just 310 W of max
power consumption.

SOURCE: https://e.huawei.com/en/products/cloud-computing-dc/atlas.
86
State-of-The-Art AI Accelerators

NVIDIA A100 Tensor Core GPU

(Multi-Instance GPU)

• Sparsity in deep learning shows the importance of individual weights evolves during the learning
process, and by the end of network training.
• Only a subset of weights have acquired a meaningful purpose in determining the learned output. The
remaining weights are no longer needed.

SOURCE: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.
87
State-of-The-Art AI Accelerators

Sparsity

Coarse-Grained Sparsity Fine-Grained Sparsity

• Coarse-grained sparsity: It explores zeroing out specific weights distributed across the neural network.
• Fine-grained sparsity: It explores zeroing out entire sub-networks of a neural network.

SOURCE: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf.
88
Opportunities

SOURCE: McKinsey & Company

89
Opportunities
90
Summery

 AI Shifting on Cloud and Edge Platforms

• Edge inference & learning will be more important due to privacy concern, real-time operation,
and power constraint.
• Federated learning will leverage the cloud’s big data advantage on edge.

 Domain-specific Architecture
• Application Specific Hardware Architectures are the cleaver choice.
• Programmable platforms or the easier solutions.

 Throughput/$
• AI accelerators should be cost effective to retain on the market.

 Emerging Technologies
• Newer technologies such as neuromorphic computing, processing-in-memory, quantum
computing are the future of AI accelerators.
91

Performance Task 1 - Attempt Review - WEEK1-9
No ratings yet
Performance Task 1 - Attempt Review - WEEK1-9
6 pages
Artificial Intelligence Hardware Design - Challenges and Solutions
100% (2)
Artificial Intelligence Hardware Design - Challenges and Solutions
233 pages
Esoteric Astrology and Human Development
No ratings yet
Esoteric Astrology and Human Development
80 pages
Homework #1, Sec 10.1
75% (4)
Homework #1, Sec 10.1
5 pages
Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
No ratings yet
Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
11 pages
Introduction To TensorFlow For Artificial Intelligence
No ratings yet
Introduction To TensorFlow For Artificial Intelligence
41 pages
AI For Data Science - Artificial Intelligence Frameworks and Functionality For Deep Learning, Optimization, and Beyond
No ratings yet
AI For Data Science - Artificial Intelligence Frameworks and Functionality For Deep Learning, Optimization, and Beyond
231 pages
2411.13717v2
No ratings yet
2411.13717v2
38 pages
Understanding AI Part 2 Inference, Revised
No ratings yet
Understanding AI Part 2 Inference, Revised
4 pages
Automates Neural Architecture Construction
No ratings yet
Automates Neural Architecture Construction
23 pages
14280
No ratings yet
14280
47 pages
00_introduction
No ratings yet
00_introduction
41 pages
A_Survey_Comparing_Specialized_Hardware_And_Evolution_In_TPUs_For_Neural_Networks
No ratings yet
A_Survey_Comparing_Specialized_Hardware_And_Evolution_In_TPUs_For_Neural_Networks
7 pages
Gerardo Rodríguez Barba Control Lectura 120224
No ratings yet
Gerardo Rodríguez Barba Control Lectura 120224
9 pages
L-0017398760-pdf
No ratings yet
L-0017398760-pdf
24 pages
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
21 pages
Futureinternet 12 00113 v2
No ratings yet
Futureinternet 12 00113 v2
22 pages
Lecture 1 v61
No ratings yet
Lecture 1 v61
50 pages
Introduction To AI
No ratings yet
Introduction To AI
27 pages
Intro to AI
No ratings yet
Intro to AI
45 pages
Lec 1
No ratings yet
Lec 1
31 pages
w1--Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
No ratings yet
w1--Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
19 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
41 pages
Process-in-Memory forAI
No ratings yet
Process-in-Memory forAI
168 pages
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
No ratings yet
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
15 pages
AI Accelerator
No ratings yet
AI Accelerator
5 pages
Deep Learning Cookbook
No ratings yet
Deep Learning Cookbook
24 pages
transforming-edge-ai-with-npus-in-microcontrollers
No ratings yet
transforming-edge-ai-with-npus-in-microcontrollers
12 pages
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
No ratings yet
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
19 pages
AI Chips Overview_ TPU, NPU, GPU, and FPGA - Pynomial
No ratings yet
AI Chips Overview_ TPU, NPU, GPU, and FPGA - Pynomial
9 pages
AI Omnipersenttoomnipotent
No ratings yet
AI Omnipersenttoomnipotent
29 pages
Full Stack Optimization of Transformer Inference A Survey
No ratings yet
Full Stack Optimization of Transformer Inference A Survey
45 pages
Chips For Artificial Intelligence
No ratings yet
Chips For Artificial Intelligence
3 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
Voulgaris, Bulut - AI For Data Science (AVG) (2018)
No ratings yet
Voulgaris, Bulut - AI For Data Science (AVG) (2018)
202 pages
BAED-AI2121-2322S-Performance Task 1 4th Quarter Grade 12
100% (2)
BAED-AI2121-2322S-Performance Task 1 4th Quarter Grade 12
4 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
tutorial 1 answers
No ratings yet
tutorial 1 answers
4 pages
AI Omnipersenttoomnipotent
No ratings yet
AI Omnipersenttoomnipotent
29 pages
inbound6702194954077661265
No ratings yet
inbound6702194954077661265
42 pages
DeepLearning - 1NT22CS078 - I Shania Jone
No ratings yet
DeepLearning - 1NT22CS078 - I Shania Jone
4 pages
Lecture 1 - Intro
No ratings yet
Lecture 1 - Intro
57 pages
sadarus assingment
No ratings yet
sadarus assingment
8 pages
FPGA CNN Project Paper
No ratings yet
FPGA CNN Project Paper
31 pages
Kim Gold
No ratings yet
Kim Gold
28 pages
Lec_1_AI+Applications
No ratings yet
Lec_1_AI+Applications
30 pages
20231130_IntroductionToAISystems
No ratings yet
20231130_IntroductionToAISystems
29 pages
Artificial Intelligence Concept Hypermap
No ratings yet
Artificial Intelligence Concept Hypermap
1 page
ETH Zurich Talk - April 14, 2025
No ratings yet
ETH Zurich Talk - April 14, 2025
84 pages
Emerging Trends IT Answers
No ratings yet
Emerging Trends IT Answers
4 pages
Generative-AI-at-the-edge
100% (1)
Generative-AI-at-the-edge
37 pages
Unlocking Ai Potential: From Omnipresence To Omnipotence
No ratings yet
Unlocking Ai Potential: From Omnipresence To Omnipotence
30 pages
Class X AI Notes (Autosaved)
No ratings yet
Class X AI Notes (Autosaved)
110 pages
Acp Artificial Intelligence Assignment by Kanhaiya Arora
No ratings yet
Acp Artificial Intelligence Assignment by Kanhaiya Arora
7 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
Michael sir presentation
No ratings yet
Michael sir presentation
31 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Machine Minds AI for all: An Ethical Intelligence & Responsible Revolution
From Everand
Machine Minds AI for all: An Ethical Intelligence & Responsible Revolution
aarat
No ratings yet
Beyond Silicon
From Everand
Beyond Silicon
Piyush yadav
5/5 (1)
Introduction to TinyML
From Everand
Introduction to TinyML
Rohit Sharma
5/5 (1)
Artificial Intelligence: A Beginner's Guide
From Everand
Artificial Intelligence: A Beginner's Guide
Park Windsor
No ratings yet
Xu Et Al 2023
No ratings yet
Xu Et Al 2023
7 pages
Vasudeva, Harkrishan L. - Shirali, Satish - Multivariable Analysis-Springer (2011)
No ratings yet
Vasudeva, Harkrishan L. - Shirali, Satish - Multivariable Analysis-Springer (2011)
405 pages
Ncert Exemplar Math Class 10 Chapter 06 Triangles - Cleaned
No ratings yet
Ncert Exemplar Math Class 10 Chapter 06 Triangles - Cleaned
30 pages
F2 End Term Exams
No ratings yet
F2 End Term Exams
78 pages
Johnson Busemeyer Wi RE
No ratings yet
Johnson Busemeyer Wi RE
14 pages
Ads High Flow Compressed Air Couplings Staubli en
No ratings yet
Ads High Flow Compressed Air Couplings Staubli en
8 pages
OUR Presentation Is On Storage Devices and Memory
100% (2)
OUR Presentation Is On Storage Devices and Memory
27 pages
Mobatime: New Products 2017 - 2018
No ratings yet
Mobatime: New Products 2017 - 2018
20 pages
CM67 User PDF
No ratings yet
CM67 User PDF
2 pages
13 Fault Tree Analysis
100% (1)
13 Fault Tree Analysis
13 pages
Technical Drawing 8 (Quarter 2 - Week 1)
0% (2)
Technical Drawing 8 (Quarter 2 - Week 1)
4 pages
Line Modelling and Energization
No ratings yet
Line Modelling and Energization
6 pages
Fractional Order Hold Circuits: Abstract - Many Real Dynamic Systems Are Better
No ratings yet
Fractional Order Hold Circuits: Abstract - Many Real Dynamic Systems Are Better
14 pages
Handout 4
No ratings yet
Handout 4
24 pages
High Angle Cases-Management
No ratings yet
High Angle Cases-Management
11 pages
4 - Regression Analysis
No ratings yet
4 - Regression Analysis
27 pages
Equations and Inequations
50% (2)
Equations and Inequations
40 pages
Aiwa nsx-dr5 SM
No ratings yet
Aiwa nsx-dr5 SM
19 pages
Competition Effects in Combustion Chemistry and Their Role in Detonation Initiation
No ratings yet
Competition Effects in Combustion Chemistry and Their Role in Detonation Initiation
12 pages
Open Book Model Exam - One
No ratings yet
Open Book Model Exam - One
17 pages
Reading 45 Measures of Financial Risk - Answers
No ratings yet
Reading 45 Measures of Financial Risk - Answers
7 pages
Mathematics Standard QP (Sahodaya 2022)
100% (1)
Mathematics Standard QP (Sahodaya 2022)
3 pages
Formula Sheet 1 Probability Distributions: 1.1 The Binomial Distribution
No ratings yet
Formula Sheet 1 Probability Distributions: 1.1 The Binomial Distribution
8 pages
BS en 50318
100% (1)
BS en 50318
20 pages
Checkpoint October 2018 - Paper 2 (Mark Scheme)
No ratings yet
Checkpoint October 2018 - Paper 2 (Mark Scheme)
9 pages
Fundamentals of Metrology PDF
No ratings yet
Fundamentals of Metrology PDF
59 pages
Heat Transfer Coefficient - ...
No ratings yet
Heat Transfer Coefficient - ...
6 pages
Ultrasonography of The Kidney: A Pictorial Review: Diagnostics February 2016
No ratings yet
Ultrasonography of The Kidney: A Pictorial Review: Diagnostics February 2016
19 pages