AI and ML Accelerator Survey and Trends

The document surveys AI and ML accelerators from the past three years. It collects and summarizes current commercial accelerators that have been publicly announced along with their peak performance and power consumption numbers. These values are plotted on a graph and trends from the plot are discussed and analyzed. The document also includes two new trends plots based on accelerator release dates and additional trends for some neuromorphic, photonic, and memristor-based inference accelerators.

Uploaded by

guantongpeng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views

AI and ML Accelerator Survey and Trends

Uploaded by

guantongpeng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1

AI and ML Accelerator Survey and Trends

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner
MIT Lincoln Laboratory Supercomputing Center
Lexington, MA, USA
{reuther,pmichaleas,michael.jones,vijayg,sid,kepner}@ll.mit.edu

Abstract—This paper updates the survey of AI accelerators

and processors from past three years. This paper collects and
summarizes the current commercial accelerators that have been
publicly announced with peak performance and power consump-
2022 IEEE High Performance Extreme Computing Conference (HPEC) | 978-1-6654-9786-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/HPEC55821.2022.9926331

tion numbers. The performance and power values are plotted on

a scatter graph, and a number of dimensions and observations
from the trends on this plot are again discussed and analyzed.
Two new trends plots based on accelerator release dates are
included in this year’s paper, along with the additional trends
of some neuromorphic, photonic, and memristor-based inference
accelerators.
Index Terms—Machine learning, GPU, TPU, dataflow, accel-
erator, embedded inference, computational performance
Fig. 1: Canonical AI architecture consists of sensors, data con-
ditioning, algorithms, modern computing, robust AI, human-
I. I NTRODUCTION machine teaming, and users (missions). Each step is critical
Just as last year, the pace of new announcements, releases, in developing end-to-end AI applications and systems.
and deployments of artificial intelligence (AI) and machine
learning (ML) accelerators from startups and established tech-
nology companies has been modest. This is not unreason- These raw data products are fed into a data conditioning step
able; for many companies that have released an accelerator in which they are fused, aggregated, structured, accumulated,
report having spent three or four years researching, analyzing, and converted into information. The information generated by
designing, verifying, and validating their accelerator design the data conditioning step feeds into a host of supervised
trade-offs and building the software stack to program the and unsupervised algorithms such as neural networks, which
accelerator. For those who have released subsequent versions extract patterns, predict new events, fill in missing data, or
of their accelerator, they have reported shorter development look for similarities across datasets, thereby converting the
cycles, though it is still at least two or three years. The focus of input information to actionable knowledge. This actionable
these accelerators continues to be on accelerating deep neural knowledge is then passed to human beings for decision-
network (DNN) models, and the application space spans from making processes in the human-machine teaming phase. The
very low power embedded voice recognition and image clas- phase of human-machine teaming provides the users with
sification to data center scale training, while the competition useful and relevant insight turning knowledge into actionable
for defining markets and application areas continues as part intelligence or insight.
of a much larger industrial and technology shift in modern
Underpinning this system are modern computing systems.
computing to machine learning solutions.
Moore’s law trends have ended [2], as have a number of related
AI ecosystems bring together components from embed-
laws and trends including Denard’s scaling (power density),
ded computing (edge computing), traditional high perfor-
clock frequency, core counts, instructions per clock cycle,
mance computing (HPC), and high performance data analy-
and instructions per Joule (Koomey’s law) [3]. Taking a page
sis (HPDA) that must work together to effectively provide
from the system-on-chip (SoC) trends first seen in automotive
capabilities for use by decision makers, warfighters, and
applications, robotics, and smartphones, advancements and
analysts [1]. Figure 1 captures an architectural overview of
innovations are still progressing by developing and integrating
such end-to-end AI solutions and their components. On the
accelerators for often-used operational kernels, methods, or
left side of Figure 1, structured and unstructured data sources
functions. These accelerators are designed with a different
provide different views of entities and/or phenomenology.
balance between performance and functional flexibility. This
This material is based upon work supported by the Assistant Secretary includes an explosion of innovation in deep machine learning
of Defense for Research and Engineering under Air Force Contract No. processors and accelerators [4]–[8]. In this series of survey
FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations papers, we explore the relative benefits of these technologies
expressed in this material are those of the author(s) and do not necessarily
reflect the views of the Assistant Secretary of Defense for Research and since they are of particular importance to applying AI to
Engineering. domains under significant constraints such as size, weight, and
978-1-6654-9786-2/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
2

power, both in embedded applications and in data centers. fp16.32 corresponds to fp16 for multiplication and fp32 for
This paper is an update to IEEE-HPEC papers from the past accumulate/add). The form factor is depicted by color, which
three years [9]–[11]. As in past years, this paper continues shows the package for which peak power is reported. Blue
with last year’s focus on accelerators and processors that are corresponds to a single chip; orange corresponds to a card; and
geared toward deep neural networks (DNNs) and convolutional green corresponds to entire systems (single node desktop and
neural networks (CNNs) as they are quite computationally in- server systems). This survey is limited to single motherboard,
tense [12]. This survey focuses on accelerators and processors single memory-space systems. Finally, the hollow geometric
for inference for a variety of reasons including that defense objects are peak performance for inference-only accelerators,
and national security AI/ML edge applications rely heavily on while the solid geometric figures are performance for acceler-
inference. And we will consider all of the numerical precision ators that are designed to perform both training and inference.
types that an accelerator supports, but for most of them, their The survey begins with the same scatter plot that we have
best inference performance is in int8 or fp16/bf16 (IEEE 16- compiled for the past three years. As we did last year, to save
bit floating point or Google’s 16-bit brain float). space, we have summarized some of the important metadata
There are many surveys [13]–[24] and other papers that of the accelerators, cards, and systems in Table I, including
cover various aspects of AI accelerators. For instance, the first the label used in Figure 2 for each of the points on the
paper in this multi-year survey included the peak performance graph; many of the points were brought forward from last
of FPGAs for certain AI models; however, several of the year’s plot, and some details of those entries are in [9].
aforementioned surveys cover FPGAs in depth so they are There are several additions which we will cover below. In
no longer included in this survey. This multi-year survey Table I, most of the columns and entries are self explana-
effort and this paper focus on gathering a comprehensive list tory. However, there are two Technology entries that may
of AI accelerators with their computational capability, power not be: dataflow and PIM. Dataflow processors are custom-
efficiency, and ultimately the computational effectiveness of designed processors for neural network inference and training.
utilizing accelerators in embedded and data center applica- Since neural network training and inference computations can
tions. Along with this focus, this paper mainly compares be entirely deterministically laid out, they are amenable to
neural network accelerators that are useful for government dataflow processing in which computations, memory accesses,
and industrial sensor and data processing applications. A few and inter-ALU communications actions are explicitly/statically
accelerators and processors that were included in previous programmed or “placed-and-routed” onto the computational
years’ papers have been left out of this year’s survey. They hardware. Processor in memory (PIM) accelerators integrate
have been dropped because they have been surpassed by processing elements with memory technology. Among such
new accelerators from the same company, they are no longer PIM accelerators are those based on an analog computing
offered, or they are no longer relevant to the topic. technology that augments flash memory circuits with in-place
analog multiply-add capabilities. Please refer to the references
for the Mythic and Gyrfalcon accelerators for more details on
II. S URVEY OF P ROCESSORS
this innovative technology.
Many recent advances in AI can be at least partly cred- Finally, a reasonable categorization of accelerators follows
ited to advances in computing hardware [6], [7], [25], [26], their intended application, and the five categories are shown
enabling computationally heavy machine-learning algorithms as ellipses on the graph, which roughly correspond to perfor-
and in particular DNNs. This survey gathers performance and mance and power consumption: Very Low Power for speech
power information from publicly available materials including processing, very small sensors, etc.; Embedded for cameras,
research papers, technical trade press, company benchmarks, small UAVs and robots, etc.; Autonomous for driver assist
etc. While there are ways to access information from com- services, autonomous driving, and autonomous robots; Data
panies and startups (including those in their silent period), Center Chips and Cards; and Data Center Systems.
this information is intentionally left out of this survey; such For most of the accelerators, their descriptions and commen-
data will be included in this survey when it becomes publicly taries have not changed since last year so please refer to last
available. The key metrics of this public data are plotted in two years’ papers for descriptions and commentaries. There
Figure 2, which graphs recent processor capabilities (as of July are, however, several new releases that were not covered by
2022) mapping peak performance vs. power consumption. The past papers that are covered here.
dash-dotted box depicts the very dense region that is zoomed • Acelera, a Dutch embedded system startup, reported
in and plotted in Figure 3. the results of an embedded test chip that they have
The x-axis indicates peak power, and the y-axis indicate produced [37]. They claim both digital and analog design
peak giga-operations per second (GOps/s), both on a loga- capabilities, and this test chip was made to test the
rithmic scale. The computational precision of the processing extent of the digital design capabilities. They expect to
capability is depicted by the geometric shape used; the com- add analog (probably flash) design elements in upcoming
putational precision spans from analog and single-bit int1 to efforts.
four-byte int32 and two-byte fp16 to eight-byte fp64. The • Maxim Integrated has released a system-on-chip
precisions that show two types denotes the precision of the (SoC) for ultra low power applications called the
multiplication operations on the left and the precision of MAX78000 [84]–[86], which includes an ARM CPU
the accumulate/addition operations on the right (for example, core, a RISC-V CPU core and an AI accelerator. The
978-1-6654-9786-2/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
MIT LINCOLN LABORATORY
Neural Network Processing Performance
S UPERCOMPUTING C ENTER
3

8
10 Legend
Computation Precision
Data Center Tachyum Data Center
7 Systems analog
10 Chips &
DGX-A100 int1
Cards GroqNode
GyrfalconServer int2
DGX-2CS-2
TsunAImi AGX-L5 int4.8
10 6 GraphCoreNode GroqNodeCS-1 int8
Peak Performance (GOps/sec)

H100
Cornami Ascend-910
Qcomm Groq
GraphCoreBow
DGX-1
Int8.32
Alibaba
TPU4 DGX-Station
Tenstorrent A30
Arria Baidu A100
A40 int16
T4 TPU4i
GraphCore2 GraphCore
Groq
AGX-L2
TPU1 int12.16
10 5 Embedded Aimotive OrinAGX
Tesla A10 AWS
V100
TPU3
/W Axelera
GoyaAchronix int32
s Mythic108
Op
AlphaIC OrinNX
Kalray Gaudi TPU2 fp16
e ra
ToshibaXavierAGX
T Mythic76 EyeQ5 XavierNX
XavierAGX P100 fp16.32
0
10
10 4 Ascend-310 NovuMind fp32
SiMa.ai
*
Perceive Ethos Journey2 TexInst fp64
Gyrfalcon Quadric XavierNX
TPUedge
AIStorm RK3399Pro Blaize
Hailo-8
Very Low /W Bitmain Form Factor
10 3 ps O
KL720
Power
e ra
Chip
Jetson2
1 0 TKendryte Jetson1 Card
Syntiant
GAP9 Autonomous
s /W s /W System
2
10
O p O p
Maxim
Te ra ig a
0G
GAP8
1 Computation Type
1 10
10 Inference
10 -2 10 -1 10 0 10 1 10 2 10 3 10 4
Training
Peak Power (W)
view - 2
Fig.courtesy
2: Peak performance vs. Lincoln
power Laboratory
scatter plot of publicly Center
announced MIT
AI accelerators LINCOLN
and LABORATORY
processors.
Slide of Albert Reuther, MIT Supercomputing S UPERCOMPUTING C ENTER

10 6 H100 are promising the functionality of CPUs and GPUs within

Groq
each core, and it is designed for HPC and machine learn-
Ascend-910
Cornami Qcomm
ing applications. The chip is reported to have “128 high-
GraphCoreBow performance unified cores” running at 5.7 GHz [119].
Tenstorrent Alibaba
TPU4 A100 • NVIDIA announced their next generation GPU called
Peak Performance (GOps/sec)

Arria Baidu
GraphCore2
AGX-L2 A30 Groq Hopper (H100) in March 2022 [98]. It features even more
A40
T4 TPU4i GraphCore Symmetric Multiprocessors (SIMD and Tensor cores),
OrinAGX
A10 AWS V100 TPU3
10 5 TPU1
50% higher memory bandwidth, and a 700W power
Aimotive Goya
Tesla Achronix budget for the SXM mezzanine card instance. (PCIe card
power budget is 450W.
OrinNX Gaudi • Over the past couple of years, NVIDIA has also an-
TPU2

XavierAGX
nounced and released several system platforms for au-
Kalray
Toshiba
tomotive, robotic, and other embedded applications that
XavierNX P100
deploy Ampere-generation GPU architecture. Specifically
NovuMind
XavierAGX for automotive applications, the DRIVE AGX platform
10 4 added two new systems: the DRIVE AGX L2 that en-
10 1 10 2 10 3
Peak Power (W) ables Level 2 autonomous driving within a 45W power
Fig. 3: Zoomed region of peak performance vs. power scatter envelope and the DRIVE AGX L5 that is intended to
plot. enable Level 5 autonomous driving within an 800W
power envelope [103]. Similarly, the Jetson AGX Orin
and Jetson NX Orin also use an Ampere-generation
GPU, and are intended for robotics, factory automation,
ARM core is for quick prototyping and code reuse, etc. [100], [101], and they consume a maximum of 60W
while the RISC-V core is included to enable optimizing and 25W peak power.
for the lowest power utilization. The AI accelerator has • Graphcore shared rough peak performance numbers for
64 parallel processors and support 1-bit, 2-bit, 4-bit, their second generation accelerator chip, the CG200 [59],
and 8-bit integer operations. The SoC operates at a [129], [130]. Since it is deployed on a PCIe card, we
maximum of 30mW and is intended for low-latency, can assume that the peak power is around 300W. In the
battery-powered applications. past year, Graphcore also announced it’s Bow accelerator,
• Tachyum came out of startup stealth mode in 2017, and which is the first wafer-on-wafer processor designed
they just recently announced the release of an evaluation in cooperation with TSMC. The accelerator itself is
board for their Prodigy all-in-one processor [128]. They the same CG200 as mentioned above, but it is mated

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
4

TABLE I: List of accelerator labels for plots. Next, we must mention accelerators that do not appear on
Company
Achronix
Product
VectorPath S7t-VG6
Label
Achronix
Technology Form Factor
dataflow Card
References
[27]
Figure 2 yet. Each has been released with some benchmark
Aimotive aiWare3 Aimotive dataflow Chip [28] results but either no peak performance numbers or no peak
AIStorm AIStorm AIStorm dataflow Chip [29]
Alibaba Alibaba Alibaba dataflow Card [30] power numbers.
AlphaIC RAP-E AlphaIC dataflow Chip [31]
Amazon Inferentia AWS dataflow Card [32], [33] • After last year releasing some impressive benchmark
ARM Ethos N77 Ethos dataflow Chip [36]
Axelera Axelera Test Core Axelera dataflow Chip [37] results for their reconfigurable AI accelerator technol-
Baidu Baidu Kunlun 818-300 Baidu dataflow Chip [38]–[40]
Bitmain BM1880 Bitmain dataflow Chip [41] ogy [131] and this year publishing two deeper technol-
Blaize El Cano Blaize dataflow Card [42]
Canaan Kendrite K210 Kendryte CPU Chip [45] ogy reveals [132], [133] and an applications paper with
Cerebras CS-1 CS-1 dataflow System [46]
Cerebras CS-2 CS-2 dataflow System [47] Argonne National Laboratory [134], SambaNova still has
Cornami Cornami Cornami dataflow Chip [48]
Enflame Cloudblazer T10 Enflame CPU Card [49]
not provided any details from which we can estimate peak
Google
Google
TPU Edge
TPU1
TPUedge
TPU1
dataflow
dataflow
System
Chip
[51]
[52], [53]
performance or power consumption of their solutions.
Google TPU2 TPU2 dataflow Chip [52], [53] • In May 2022, Intel’s Habana Labs announced the sec-
Google TPU3 TPU3 dataflow Chip [52]–[54]
Google TPU4i TPU4i dataflow Chip [54] ond generations of the Goya inference accelerator and
Google TPU4 TPU4 dataflow Chip [55]
GraphCore C2 GraphCore dataflow Card [56], [57] Gaudi training accelerator, named Greco and Gaudi2,
GraphCore C2 GraphCoreNode dataflow System [58]
GraphCore Colossus Mk2 GraphCore2 dataflow Card [59] respectively [135], [136]. Both promised multiple times
GraphCore Bow-2000 GraphCoreBow dataflow Card [60]
GreenWaves GAP8 GAP8 dataflow Chip [61], [62] better performance than their predecessor. Greco will be a
GreenWaves GAP9 GAP9 dataflow Chip [61], [62]
Groq Groq Node GroqNode dataflow System [63] single-width PCIe card drawing 75W, while the Gaudi2
Groq Tensor Streaming Processor Groq dataflow Card [56], [64]
Gyrfalcon Gyrfalcon Gyrfalcon PIM Chip [65]
will continue to be a double-width PCIe card drawing
Gyrfalcon
Habana
Gyrfalcon
Gaudi
GyrfalconServer
Gaudi
PIM
dataflow
System
Card
[66]
[67], [68]
650W (likely on a PCIe 5.0 slot). Habana released some
Habana
Hailo
Goya HL-1000
Hailo
Goya
Hailo-8
dataflow
dataflow
Card
Chip
[68], [69]
[70]
benchmarking comparisons to Nvidia A100 GPUs for
Horizon Robotics Journey2 Journey2 dataflow Chip [71] the Gaudi2, but peak performance numbers were not
Huawei HiSilicon Ascend 310 Ascend-310 dataflow Chip [72]
Huawei HiSilicon Ascend 910 Ascend-910 dataflow Chip [73] disclosed for either of these accelerators.
Intel Arria 10 1150 Arria FPGA Chip [74], [75]
Intel Mobileye EyeQ5 EyeQ5 dataflow Chip [42] • Esperanto has produced a few demo chips for evaluation
Kalray Coolidge Kalray manycore Chip [80], [81]
Kneron KL720 KL720 dataflow Chip [83] by Samsung and other partners [137]. The chip is reported
Maxim Max 78000 Maxim dataflow Chip [84]–[86]
Mythic M1076 Mythic76 PIM Chip [88]–[90] to be a 1,000-core RISC-V processor with each core
Mythic M1108 Mythic108 PIM Chip [88]–[90]
NovuMind NovuTensor NovuMind dataflow Chip [91], [92]
having an AI tensor accelerator. Esperanto has published
NVIDIA
NVIDIA
Ampere A10
Ampere A100
A10
A100
GPU
GPU
Card
Card
[93]
[94]
a few relative performance metrics [138], but they have
NVIDIA
NVIDIA
Ampere A30
Ampere A40
A30
A40
GPU
GPU
Card
Card
[93]
[93]
not disclosed any peak power or peak performance values.
NVIDIA DGX Station DGX-Station GPU System [95] • During the Tesla AI Day event, Telsa gave some details of
NVIDIA DGX-1 DGX-1 GPU System [95], [96]
NVIDIA DGX-2 DGX-2 GPU System [96] their custom-built Dojo accelerator and system. They did
NVIDIA DGX-A100 DGX-A100 GPU System [97]
NVIDIA H100 H100 GPU Card [98] provide peak performance of 22.6 TF FP32 performance
NVIDIA Jetson AGX Xavier XavierAGX GPU System [99]
NVIDIA Jetson NX Orin OrinNX GPU System [100], [101] per chip, but they did not report peak power draw per
NVIDIA Jetson AGX Orin OrinAGX GPU System [100], [101]
NVIDIA Jetson TX1 Jetson1 GPU System [102] chip. Perhaps these details will come later [139].
NVIDIA Jetson TX2 Jetson2 GPU System [102]
NVIDIA Jetson Xavier NX XavierNX GPU System [99] Finally there is one departure to the report this year. Last
NVIDIA DRIVE AGX L2 AGX-L2 GPU System [103]
NVIDIA DRIVE AGX L5 AGX-L5 GPU System [103] year, Centaur Technology announced a x86 CPU with an
NVIDIA Pascal P100 P100 GPU Card [104], [105]
NVIDIA T4 T4 GPU Card [106] integrated AI accelerator, which was realized as a 4,096 byte-
NVIDIA
Perceive
Volta V100
Ergo
V100
Perceive
GPU
dataflow
Card
Chip
[105], [107]
[108]
wide SIMD unit. The performance estimates were competitive,
Preferred Networks
Quadric
MN-3
q1-64
Preferred-MN-3 multicore
Quadric dataflow
Card
Chip
[110], [111]
[112]
but VIA Technologies, the parent company of Centaur, sold
Qualcomm
Rockchip
Cloud AI 100
RK3399Pro
Qcomm
RK3399Pro
dataflow
dataflow
Card
Chip
[113], [114]
[115]
off the USA-based engineering team of the processor to Intel,
SiMa.ai SiMa.ai SiMa.ai dataflow Chip [116] Corp. and seems to have ended the development of the CNS
Syntiant NDP101 Syntiant PIM Chip [117], [118]
Tachyum Prodigy Tachyum CPU Chip [119] processor [140].
Tenstorrent Tenstorrent Tenstorrent multicore Card [120]
Tesla Tesla Full Self-Driving Computer Tesla dataflow System [121], [122]
Texas Instruments TDA4VM TexInst dataflow Chip [123]–[125]
Toshiba 2015 Toshiba multicore System [126] III. O BSERVATIONS AND T RENDS
Untether TsunAImi TsunAImi PIM Card [127]
There are several observations comments for us to appreci-
ate on Figure 2.
with a second wafer that greatly improves power and • Int8 continues to be the default numerical precision for
clock distribution throughout the CG200 chip [60]. This embedded, autonomous and data center inference appli-
translates into 40% better performance and 16% better cations. This precision is adequate for most AI/ML ap-
performance-per-Watt. plications with a reasonable number of classes. However,
• Almost a year after Google announced details of their some accelerators also use fp16 and/or bf16 for inference.
fourth generation inference-only TPU4i accelerator in For training, has become integer representations
June 2021 [54], Google shared details about their fourth • Among the very low power chips, what is not captured is
generation training accelerator, TPUv4. Very few details the other features beyond the machine learning accelera-
were announced, but they did share peak power and per- tor on the chip. It is very common in this category and
formance numbers [55]. As with previous TPU variants, the Embedded category to release system-on-chip (SoC)
TPU4 is available through the Google Compute Cloud solutions, which often include low-power CPU cores,
and for internal operations. audio and video analog-to-digital converters (ADCs),
978-1-6654-9786-2/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
MIT LINCOLNNeural Network Peak Performance 5
LABORATORY
S U P E R C O M PPast
U T I NDecade –RPrecision Comparison
G C ENTE

encryption engines, network interfaces, etc. These ad-

Computation Precision NV-A100
ditional features of the SoCs do not change the peak 10 3
int4
Groq-TSP
performance metric, but they do have a direct impact on int8 Baidu-Kunlun2
NV-A100 NV-A10NV-A30
BFloat16 Qcomm Tenstorrent
the peak power reported for the chip, so please keep this fp32 AMD-MI250

Peak Performance (TOps/sec - Log)

Baidu-Kunlun NV-A100 NV-A10NV-A30
in mind when comparing them. NV-T4
TPU4i AMD-MI100
NV-A10 AMD-MI210
TPU3 NV-A30
• Not much has changed in the Embedded segment, which 10
2 TPU1 NV-V100
NV-T4 Qcomm
Habana-Goya
probably means that the computational performance and
AMD-MI60 Intel-Xe-HPC
peak power is adequate for the types of applications in TPU2 AMD-MI250
NV-A10 Baidu-Kunlun2
this area. GraphCore1 GraphCore2
Baidu-Kunlun AMD-MI100 AMD-MI210
• The density has become very crowded in the Autonomous AMD-MI60
NV-V100
NV-A100
1 NV-P100
and Data Center Chips and Cards segments, which 10 NV-K80
NV-A30
AMD-MI8 NV-T4
required the zoomed in Figure 3. Over the past few
NV-K20
years, several established embedded computing micro-
electronics companies including Texas Instruments have
released AI accelerators, while NVIDIA has released 10 0
and announced several more powerful automotive MITandLINCOLN 2012
Neural
2013 2014
LABORATORY Network
2015 2016
Release Peak
2017 2018
Date Performance
2019 2020 2021 2022

robotics application systems as mentioned above. Among C Past

S U P E RFrom: M P UDecade
OAlbert
LLSC Overview - 3 C E–N Fab
T I NLLGSupercomputing T E R Technology Comparison MIT LINCOL
(a) Reuther,
PeakMIT performance Center
for various precisions vs. release date. S
the Data Center Cards, the PCIe v5 specification is highly UPERCOMP

anticipated so as to break through the 300W power limit Fab Tech

10 3 GF-12
of PCIe v4. GF-28 Groq-TSP
Baidu-Kunlun2
Samsung-7 NV-A100
• Finally, the high-end training systems are not only posting Samsung-14
Qcomm Tenstorrent
AMD-MI250
Peak Performance (TOps/sec - Log)
very impressive performance numbers, but those com- TSMC-6 Baidu-Kunlun NV-A10NV-A30
TSMC-7 GraphCore1 GraphCore2
panies have also been announcing highly scalable inter- TSMC-8 TPU3
TPU4i AMD-MI100 AMD-MI210
TSMC-12 TPU1 NV-T4
networking technologies to network thousands of cards 10 2 NV-V100
TSMC-16 Habana-Goya
TSMC-28
together. This is particularly important for dataflow accel- AMD-MI60 Intel-Xe-HPC
TPU2 Habana-Gaudi
erators like Cerebras, GraphCore, Groq, Tesla Dojo, and
SambaNova, which are explicitly/statically programmed NV-P100

or “placed-and-routed” onto the computational hardware. 10 1 NV-K80 Precision

It enables these accelerators to accommodate extremely AMD-MI8 int4
int8
large models like transformers [141]. NV-K20 BFloat16
fp16
fp32

A. Broader Trends 10
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
We also collected release dates, fabrication technology, and Release Date
MIT LINCO
peak performance for multiple precisions for a smaller subset LLSC Overview - 4 From: Albert Reuther, MIT LL Supercomputing Center
(b) Peak performance and fabrication technology vs. release date. S UPERCOM
of accelerators listed in Table I. We were curious about the
Fig. 4: Trends with respect to release date for subset of
trends of peak performance over the past ten years and how
publicly announced AI accelerators and processors.
numerical precision and fabrication technology influenced it.
These data are plotted in Figure 4. Figure 4a plots the release
date of a number of accelerators versus their peak performance
for one or more precision formats. There are marked gains in numerical formats for inference and training continue. For
peak performance for each of the precision formats, but within inference, some discussion continues whether int4 will be
each format the maximum gain is 1.5 orders of magnitude acceptable for embedded inference, and the Maxim MAX
over the 10-year period. In Figure 4b, we plot the release 78000 SoC solution supports 1-bit, 2-bit, 4-bit, and 8-bit
date versus the fabrication technology used for the accelerator. integer weights [85]. On the training side, it has been an-
The default precision for the peak performance values is int8; nounced that NIVIDA Hopper, Intel Gaudi2 and a future
however, there are a number of accelerators (e.g., NVIDIA GraphCore accelerator will support the lower precision FP8
K20, K80 and AMD Mi8) which did not have int8 support. numerical format [142]. GraphCore posted an analysis paper
For these accelerators, the peak performance is reported for on FP8 [143], including trade-off analyses of scaled integer
the lowest precision that the accelerator supported. This plot versus floating point representations, different 8-bit floating
shows that much performance has been gained over the past point representations, and mixed representation DNN model
ten years by supporting lower precision formats; it is partic- performance.
ularly interesting to observe how support for lower precision Another trend that has caught our attention is that math-
formats was included in these accelerators as research and ematical kernels other than DNN/CNN models have been
industry explore the effectiveness of lower floating point and implemented on several dataflow accelerators. These dataflow
integer formats in CNN/DNN inference and training. accelerators generally handle each data item independently
We have several more observations and trends that are not (i.e., there are no cache lines), and data movement and com-
yet captured in graphs. First, the exploration for the best putational operations are explicitly/statically programmed or
978-1-6654-9786-2/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
6

“placed-and-routed” onto the computational hardware (as men- tups have announced photonic inference processors, includ-
tioned previously). Hence, they are amenable to implementing ing LightMatter [165], Lightelligence [166], LightOn [167],
other mathematical kernels for digital signal processing, phys- and Optalysys [168], [169], and several of these companies
ical simulation like computational fluid dynamics and weather have suggested that they will publish performance and power
simulation, and massive graph processing. Cerebras demon- measurements later this year [170], [171]. The LightMatter,
strated the mapping of fast stencil-code onto their wafer- Lightelligence, and LightOn accelerators implement multiply-
scale processor [144], while researchers from the University accumulate computations directly with Mach-Zehnder inter-
of Bristol demonstrated stencil codes and image processing ferometers, while the Optalysys uses an 2-dimensional FFT
using a GraphCore IPU [145]. A team from Citadel Enterprise technique also based on Mach-Zehnder interferometers.
America also reported on a series of HPC microbenchmarks
that they ran on GraphCore IPUs [146]. Google Research IV. S UMMARY
has been very busy demonstrating their TPUs on a variety This paper updated the survey of deep neural network
of parallel HPC applications including flood prediction [147], accelerators that span from extremely low power through
large scale distributed linear algebra [148], molecular dynam- embedded and autonomous applications to data center class
ics simulation [149], fast Fourier transforms [150], [151], MRI accelerators for inference and training. We focused on infer-
reconstruction [152], financial Monte Carlo simulations [153], ence accelerators, and discussed some new additions for the
and Monte Carlo simulation of the Ising model [154]. We year. The rate of announcements and releases has continued
see this as a foreshadowing of more interesting research and to be consistent and modest.
development in using this high performance accelerators.
V. DATA AVAILABILITY
B. Other Technologies The data spreadsheets and references that have been col-
The word neuromorphic has become a nebulous term. lected for this study and its papers will be posted at https:
In industry, it seems to have settled on any computational //github.com/areuther/ai-accelerators after they have cleared
circuit that in some way mimics some aspects of how the the release review process.
synapses in brains work. When this is applied most broadly,
it encompasses many if not all of the accelerators that this ACKNOWLEDGEMENT
series of papers surveys. In academia and the broader research We express our gratitude to Masahiro Arakawa, Bill Ar-
world, neuromorphic computing is the research, design, and cand, Bill Bergeron, David Bestor, Bob Bond, Chansup Byun,
development of computational hardware that models function- Nathan Frey, Vitaliy Gleyzer, Jeff Gottschalk, Michael Houle,
ality and processes in brains, including chemical processes Matthew Hubbell, Hayden Jananthan, Anna Klein, David
and electrical processes [155], [156]. These brain process Martinez, Joseph McDonald, Lauren Milechin, Sanjeev Mo-
simulation efforts have spanned the past four decades, but there hindra, Paul Monticciolo, Julie Mullen, Andrew Prout, Stephan
is only a modest overlap with the accelerators that are captured Rejto, Antonio Rosa, Matthew Weiss, Charles Yee, and Marc
in these surveys. Zissman for their support of this work.
One clear overlap is circuitry based on spiking neural
networks, which is what we will focus on here. Intel probably R EFERENCES
has the most extensive research program for evaluating the
[1] V. Gadepally, J. Goodwin, J. Kepner, A. Reuther, H. Reynolds,
commercial viability of spiking neural network accelerators S. Samsi, J. Su, and D. Martinez, “AI Enabling Technologies,” MIT
with their Loihi technology [157], [158] and Intel Neuromor- Lincoln Laboratory, Lexington, MA, Tech. Rep., may 2019. [Online].
phic Development Community [159]. Among the applications Available: https://arxiv.org/abs/1905.03592
[2] T. N. Theis and H. . P. Wong, “The End of Moore’s Law: A
that have been explored with Loihi are target classification in New Beginning for Information Technology,” Computing in Science
synthetic aperture radar and optical imagery [160], automotive Engineering, vol. 19, no. 2, pp. 41–50, mar 2017. [Online]. Available:
scene analysis [161], and spectrogram encoder [158]. Further, https://doi.org/10.1109/MCSE.2017.29
[3] M. Horowitz, “Computing’s Energy Problem (and What We Can Do
one company, Innatera, has announced a commercial spiking About It),” in 2014 IEEE International Solid-State Circuits Conference
neural network processor [162]. They have shared an example Digest of Technical Papers (ISSCC). IEEE, feb 2014, pp. 10–14.
inference benchmark demonstration [163], but they have not [Online]. Available: http://ieeexplore.ieee.org/document/6757323/
[4] C. E. Leiserson, N. C. Thompson, J. S. Emer, B. C. Kuszmaul, B. W.
release peak performance or power numbers. In a related vein, Lampson, D. Sanchez, and T. B. Schardl, “There’s Plenty of Room
some memristor technology is showing its effectiveness in at the Top: What Will Drive Computer Performance after Moore’s
simulating variable neuron-synapse functionality. However, the Law?” Science, vol. 368, no. 6495, jun 2020. [Online]. Available:
https://science.sciencemag.org/content/368/6495/eaam9744
use of memristors in AI/ML accelerators is still very much [5] N. C. Thompson and S. Spanuth, “The Decline of Computers as a
in the research phase. A company call Knowm is working General Purpose Technology,” Communications of the ACM, vol. 64,
towards commercialization of a memristor-based accelera- no. 3, pp. 64–72, mar 2021.
[6] J. L. Hennessy and D. A. Patterson, “A New Golden Age for
tor [164], but that is probably a few years away. They do Computer Architecture,” Communications of the ACM, vol. 62, no. 2,
sell a memristors and an evaluation kit on their website. pp. 48–60, jan 2019. [Online]. Available: http://dl.acm.org/citation.
Progress continues to be made in building and commer- cfm?doid=3310134.3282307
[7] W. J. Dally, Y. Turakhia, and S. Han, “Domain-Specific Hardware
cializing silicon photonic for AI/ML accelerators, including Accelerators,” Communications of the ACM, vol. 63, no. 7, pp. 48–57,
an extensive survey paper [24]. Several optical/photonic star- jun 2020. [Online]. Available: https://dl.acm.org/doi/10.1145/3361682

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
7

[8] Y. LeCun, “Deep Learning Hardware: Past, Present, and Future,” in [27] G. Roos, “FPGA Acceleration Card Delivers on Bandwidth, Speed, and
2019 IEEE International Solid- State Circuits Conference - (ISSCC), Flexibility,” nov 2019. [Online]. Available: https://www.eetimes.com/
feb 2019, pp. 12–19. fpga-acceleration-card-delivers-on-bandwidth-speed-and-flexibility/
[9] A. Reuther, P. Michaleas, M. Jones, V. Gadepally, S. Samsi, and [28] “aiWare3 Hardware IP Helps Drive Autonomous Vehicles To
J. Kepner, “AI Accelerator Survey and Trends,” in 2021 IEEE High Production,” oct 2018. [Online]. Available: https://aimotive.com/news/
Performance Extreme Computing Conference (HPEC), sep 2021, pp. content/1223
1–9. [29] R. Merritt, “Startup Accelerates AI at the Sensor,” feb 2019.
[10] ——, “Survey of Machine Learning Accelerators,” in 2020 IEEE High [Online]. Available: https://www.eetimes.com/startup-accelerates-ai-
Performance Extreme Computing Conference (HPEC), 2020, pp. 1–12. at-the-sensor/
[11] ——, “Survey and Benchmarking of Machine Learning Accelerators,” [30] T. Peng, “Alibaba’s New AI Chip Can Process
in 2019 IEEE High Performance Extreme Computing Conference, Nearly 80K Images Per Second,” 2019. [Online].
HPEC 2019. Institute of Electrical and Electronics Engineers Inc., sep Available: https://medium.com/syncedreview/alibabas-new-ai-chip-
2019. [Online]. Available: https://doi.org/10.1109/HPEC.2019.8916327 can-process-nearly-80k-images-per-second-63412dec22a3
[12] A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep [31] P. Clarke, “Indo-US Startup Preps Agent-based AI Processor,” aug
Neural Network Models for Practical Applications,” arXiv preprint 2018. [Online]. Available: https://www.eenewsanalog.com/news/indo-
arXiv:1605.07678, 2016. [Online]. Available: http://arxiv.org/abs/1605. us-startup-preps-agent-based-ai-processor/page/0/1
07678 [32] J. Hamilton, “AWS Inferentia Machine Learning Processor,” nov
[13] C. S. Lindsey and T. Lindblad, “Survey of Neural Network 2018. [Online]. Available: https://perspectives.mvdirona.com/2018/11/
Hardware,” in SPIE 2492, Applications and Science of Artificial aws-inferentia-machine-learning-processor/
Neural Networks, S. K. Rogers and D. W. Ruck, Eds., vol. [33] C. Evangelist, “Deep dive into Amazon Inferentia: A Custom-
2492. International Society for Optics and Photonics, apr 1995, pp. Built Chip to Enhance ML and AI,” jan 2020. [On-
1194–1205. [Online]. Available: http://proceedings.spiedigitallibrary. line]. Available: https://www.cloudmanagementinsider.com/amazon-
org/proceeding.aspx?articleid=1001095 inferentia-for-machine-learning-and-artificial-intelligence/
[14] Y. Liao, “Neural Networks in Hardware: A Survey,” Department [34] ExxactCorp, “Taking a Deeper Look at AMD Radeon Instinct GPUs for
of Computer Science, University of California, Tech. Rep., 2001. Deep Learning,” dec 2017. [Online]. Available: https://blog.exxactcorp.
[Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi= com/taking-deeper-look-amd-radeon-instinct-gpus-deep-learning/
10.1.1.460.3235 [35] R. Smith, “AMD Announces Radeon Instinct MI60 & MI50
[15] J. Misra and I. Saha, “Artificial Neural Networks in Hardware: Accelerators Powered By 7nm Vega,” nov 2018. [Online].
A Survey of Two Decades of Progress,” Neurocomputing, vol. 74, Available: https://www.anandtech.com/show/13562/amd-announces-
no. 1-3, pp. 239–255, dec 2010. [Online]. Available: https: radeon-instinct-mi60-mi50-accelerators-powered-by-7nm-vega
//doi.org/10.1016/j.neucom.2010.03.021 [36] D. Schor, “Arm Ethos is for Ubiquitous AI At the Edge,” feb 2020.
[16] V. Sze, Y. Chen, T. Yang, and J. S. Emer, “Efficient Processing of [Online]. Available: https://fuse.wikichip.org/news/3282/arm-ethos-is-
Deep Neural Networks: A Tutorial and Survey,” Proceedings of the for-ubiquitous-ai-at-the-edge/
IEEE, vol. 105, no. 12, pp. 2295–2329, dec 2017. [Online]. Available:
[37] S. Ward-Foxton, “Axelera Demos AI Test Chip After Taping Out in
https://doi.org/10.1109/JPROC.2017.2761740
Four Months,” may 2022. [Online]. Available: https://www.eetimes.
[17] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer,
com/axelera-demos-ai-test-chip-after-taping-out-in-four-months/
Efficient Processing of Deep Neural Networks. Morgan
[38] J. Ouyang, X. Du, Y. Ma, and J. Liu, “Kunlun: A 14nm High-
and Claypool Publishers, 2020. [Online]. Available: https:
Performance AI Processor for Diversified Workloads,” in 2021 IEEE
//doi.org/10.2200/S01004ED1V01Y202004CAC050
International Solid- State Circuits Conference (ISSCC), vol. 64, feb
[18] H. F. Langroudi, T. Pandit, M. Indovina, and D. Kudithipudi, “Digital
2021, pp. 50–51.
Neuromorphic Chips for Deep Learning Inference: A Comprehensive
[39] R. Merritt, “Baidu Accelerator Rises in AI,” jul 2018. [Online].
Study,” in Applications of Machine Learning, M. E. Zelinski, T. M.
Available: https://www.eetimes.com/baidu-accelerator-rises-in-ai/
Taha, J. Howe, A. A. Awwal, and K. M. Iftekharuddin, Eds. SPIE,
sep 2019, p. 9. [Online]. Available: https://doi.org/10.1117/12.2529407 [40] C. Duckett, “Baidu Creates Kunlun Silicon for AI,” jul 2018. [Online].
[19] Y. Chen, Y. Xie, L. Song, F. Chen, and T. Tang, “A Survey of Available: https://www.zdnet.com/article/baidu-creates-kunlun-silicon-
Accelerator Architectures for Deep Neural Networks,” Engineering, for-ai/
vol. 6, no. 3, pp. 264–274, mar 2020. [Online]. Available: [41] B. Wheeler, “Bitmain SoC Brings AI to the Edge,” feb 2019. [Online].
https://doi.org/10.1016/j.eng.2020.01.007 Available: https://www.linleygroup.com/newsletters/newsletter detail.
[20] E. Wang, J. J. Davis, R. Zhao, H.-C. C. Ng, X. Niu, W. Luk, php%3Fnum=5975%26year=2019%26tag=3
P. Y. K. Cheung, and G. A. Constantinides, “Deep Neural [42] M. Demler, “Blaize Ignites Edge-AI Performance,” The Linley Group,
Network Approximation for Custom Hardware,” ACM Computing Tech. Rep., sep 2020. [Online]. Available: https://www.blaize.com/
Surveys, vol. 52, no. 2, pp. 1–39, may 2019. [Online]. Available: wp-content/uploads/2020/09/Blaize-Ignites-Edge-AI-Performance.pdf
https://dl.acm.org/doi/10.1145/3309551 [43] Y. Wu, “Chinese AI Chip Maker Cambricon Unveils
[21] S. Khan and A. Mann, “AI Chips: What They Are and Why They New Cloud-Based Smart Chip,” may 2018. [Online].
Matter,” Georgetown Center for Security and Emerging Technology, Available: https://www.chinamoneynetwork.com/2018/05/04/chinese-
Tech. Rep., apr 2020. [Online]. Available: https://cset.georgetown.edu/ ai-chip-maker-cambricon-unveils-new-cloud-based-smart-chip
research/ai-chips-what-they-are-and-why-they-matter/ [44] I. Cutress, “Cambricon, Maker of Hauwei’s Kirin NPU IP,
[22] U. Rueckert, “Digital Neural Network Accelerators,” in NANO-CHIPS Build a Big AI Chip and PCIe Card,” may 2018. [On-
2030: On-Chip AI for an Efficient Data-Driven World, B. Murmann line]. Available: https://www.anandtech.com/show/12815/cambricon-
and B. Hoefflinger, Eds. Springer, Cham, 2020, ch. 12, pp. 181–202. makers-of-huaweis-kirin-npu-ip-build-a-big-ai-chip-and-pcie-card
[Online]. Available: https://link.springer.com/chapter/10.1007%2F978- [45] L. Gwennap, “Kendryte Embeds AI for Surveillance,” mar
3-030-18338-7 12 2019. [Online]. Available: https://www.linleygroup.com/newsletters/
[23] T. Rogers and M. Khairy, “An Academic’s Attempt to Clear the Fog newsletter detail.php?num=5992
of the Machine Learning Accelerator War — SIGARCH,” aug 2021. [46] A. Hock, “Introducing the Cerebras CS-1, the Industry’s
[Online]. Available: https://www.sigarch.org/an-academics-attempt-to- Fastest Artificial Intelligence Computer,” nov 2019. [Online].
clear-the-fog-of-the-machine-learning-accelerator-war/ Available: https://www.cerebras.net/introducing-the-cerebras-cs-1-the-
[24] F. P. Sunny, E. Taheri, M. Nikdast, and S. Pasricha, “A Survey on industrys-fastest-artificial-intelligence-computer/
Silicon Photonics for Deep Learning,” ACM Journal on Emerging [47] T. Trader, “Cerebras Doubles AI Performance with Second-
Technologies in Computing Systems, vol. 17, no. 4, oct 2021. [Online]. Gen 7nm Wafer Scale Engine,” apr 2021. [Online].
Available: https://dl.acm.org/doi/10.1145/3459009 Available: https://www.hpcwire.com/2021/04/20/cerebras-doubles-ai-
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classifica- performance-with-second-gen-7nm-wafer-scale-engine/
tion with Deep Convolutional Neural Networks,” Neural Information [48] “Cornami Achieves Unprecedented Performance at Lowest Power
Processing Systems, vol. 25, 2012. Dissipation for Deep Neural Networks,” oct 2019. [Online]. Available:
[26] N. P. Jouppi, C. Young, N. Patil, and D. Patterson, “A Domain-Specific https://cornami.com/1416-2/
Architecture for Deep Neural Networks,” Communications of the [49] P. Clarke, “GlobalFoundries Aids Launch of Chinese AI Startup,”
ACM, vol. 61, no. 9, pp. 50–59, aug 2018. [Online]. Available: dec 2019. [Online]. Available: https://www.eenewsanalog.com/news/
http://doi.acm.org/10.1145/3154484 globalfoundries-aids-launch-chinese-ai-startup

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
8

[50] V. Mehta, “Performance Estimation and Benchmarks for Real-World [71] “Horizon Robotics Journey2 Automotive AI Processor Series,” 2020.
Edge Inference Applications,” in Linley Spring Processor Conference. [Online]. Available: https://en.horizon.ai/product/journey
Linley Group, 2020. [72] Huawei, “Ascend 310 AI Processor,” 2020. [Online]. Available: https:
[51] “Edge TPU,” 2019. [Online]. Available: https://cloud.google.com/ //e.huawei.com/us/products/cloud-computing-dc/atlas/ascend-310
edge-tpu/ [73] ——, “Ascend 910 AI Processor,” 2020. [Online]. Available: https:
[52] N. P. Jouppi, D. H. Yoon, G. Kurian, S. Li, N. Patil, J. Laudon, //e.huawei.com/us/products/cloud-computing-dc/atlas/ascend-910
C. Young, and D. Patterson, “A Domain-Specific Supercomputer for [74] M. S. Abdelfattah, D. Han, A. Bitar, R. DiCecco, S. O’Connell,
Training Deep Neural Networks,” Commun. ACM, vol. 63, no. 7, pp. N. Shanker, J. Chu, I. Prins, J. Fender, A. C. Ling, and G. R. Chiu,
67–78, jun 2020. [Online]. Available: https://doi.org/10.1145/3360307 “DLA: Compiler and FPGA Overlay for Neural Network Inference
[53] P. Teich, “Tearing Apart Google’s TPU 3.0 AI Coprocessor,” may Acceleration,” in 2018 28th International Conference on Field
2018. [Online]. Available: https://www.nextplatform.com/2018/05/10/ Programmable Logic and Applications (FPL), aug 2018, pp. 411–
tearing-apart-googles-tpu-3-0-ai-coprocessor/ 4117. [Online]. Available: https://doi.org/10.1109/FPL.2018.00077
[54] N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, [75] N. Hemsoth, “Intel FPGA Architecture Focuses
G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma, T. Norrie, N. Patil, on Deep Learning Inference,” jul 2018. [On-
S. Prasad, C. Young, Z. Zhou, D. Patterson, and G. Llc, “Ten Lessons line]. Available: https://www.nextplatform.com/2018/07/31/intel-fpga-
From Three Generations Shaped Google’s TPUv4i,” in Proc. of 2021 architecture-focuses-on-deep-learning-inference/
ACM/IEEE 48th Annual International Symposium on Computer Archi- [76] J. Hruska, “New Movidius Myriad X VPU Packs a
tecture (ISCA). IEEE Computer Society, jun 2021, pp. 1–14. Custom Neural Compute Engine,” aug 2017. [Online]. Avail-
[55] O. Peckham, “Google Cloud’s New TPU v4 ML Hub Packs 9 Exaflops able: https://www.extremetech.com/computing/254772-new-movidius-
of AI,” may 2022. [Online]. Available: https://www.hpcwire.com/2022/ myriad-x-vpu-packs-custom-neural-compute-engine
05/16/google-clouds-new-tpu-v4-ml-hub-packs-9-exaflops-of-ai/ [77] J. De Gelas, “Intel’s Xeon Cascade Lake vs. NVIDIA Turing: An
[56] L. Gwennap, “Groq Rocks Neural Networks,” Micropro- Analysis in AI,” jul 2019. [Online]. Available: https://www.anandtech.
cessor Report, Tech. Rep., jan 2020. [Online]. Avail- com/show/14466/intel-xeon-cascade-lake-vs-nvidia-turing
able: http://groq.com/wp-content/uploads/2020/04/Groq-Rocks-NNs- [78] “Intel Xeon Platinum 8180,” 2020. [Online]. Available: http:
Linley-Group-MPR-2020Jan06.pdf //www.cpu-world.com/CPUs/Xeon/Intel-Xeon8180.html
[57] D. Lacey, “Preliminary IPU Benchmarks,” oct 2017. [79] “Intel Xeon Platinum 8280,” 2020. [Online]. Available: http:
[Online]. Available: https://www.graphcore.ai/posts/preliminary- //www.cpu-world.com/CPUs/Xeon/Intel-Xeon8280.html
ipu-benchmarks-providing-previously-unseen-performance-for-a- [80] B. Dupont de Dinechin, “Kalray’s MPPA® Manycore Processor: At
range-of-machine-learning-applications the Heart of Intelligent Systems,” in 17th IEEE International New
[58] “Dell DSS8440 Graphcore IPU Server,” Graphcore, Tech. Rep., Circuits and Systems Conference (NEWCAS). Munich: IEEE, jun
feb 2020. [Online]. Available: https://www.graphcore.ai/hubfs/ 2019. [Online]. Available: https://www.european-processor-initiative.
Leadgenassets/DSS8440IPUServerWhitePaper 2020.pdf eu/dissemination-material/1259/
[81] P. Clarke, “NXP, Kalray Demo Coolidge Parallel Processor in
[59] S. Ward-Foxton, “Graphcore Takes on Nvidia with Second-Gen AI
’BlueBox’,” jan 2020. [Online]. Available: https://www.eenewsanalog.
Accelerator,” jul 2020. [Online]. Available: https://www.eetimes.com/
com/news/nxp-kalray-demo-coolidge-parallel-processor-bluebox
graphcore-takes-on-nvidia-with-second-gen-ai-accelerator/
[82] S. Ward-Foxton, “Kneron’s Next-Gen Edge AI Chip Gets $40m Boost,”
[60] M. Tyson, “Graphcore Bow IPU Introduces TSMC 3D Wafer-on-Wafer
jan 2020. [Online]. Available: https://www.eetasia.com/knerons-next-
Processor,” mar 2022. [Online]. Available: https://www.tomshardware.
gen-edge-ai-chip-gets-40m-boost/
com/news/graphcore-tsmc-bow-ipu-3d-wafer-on-wafer-processor
[83] ——, “Kneron Attracts Strategic Investors,” jan 2021. [Online]. Avail-
[61] “GAP Application Processors,” 2020. [Online]. Available: https:
able: https://www.eetimes.com/kneron-attracts-strategic-investors/
//greenwaves-technologies.com/gap8 gap9/
[84] ——, “Maxim Debuts Homegrown AI Accelerator in Latest ULP
[62] J. Turley, “GAP9 for ML at the Edge EEJournal,” jun 2020. [Online]. SoC,” nov 2020. [Online]. Available: https://www.eetimes.com/maxim-
Available: https://www.eejournal.com/article/gap9-for-ml-at-the-edge/ debuts-homegrown-ai-accelerator-in-latest-ulp-soc/
[63] N. Hemsoth, “Groq Shares Recipe for TSP Nodes, Systems,” sep [85] A. Jani, “Maxim Showcases Efficient Custom AI,” feb 2021. [Online].
2020. [Online]. Available: https://www.nextplatform.com/2020/09/29/ Available: https://www.linleygroup.com/newsletters/newsletter detail.
groq-shares-recipe-for-tsp-nodes-systems/ php?num=6274&year=2021&tag=3
[64] D. Abts, J. Ross, J. Sparling, M. Wong-VanHaren, M. Baker, [86] M. Clay, C. Grecos, M. Shirvaikar, and B. Richey, “Benchmarking
T. Hawkins, A. Bell, J. Thompson, T. Kahsai, G. Kimmell, J. Hwang, the MAX78000 Artificial Intelligence Microcontroller for Deep
R. Leslie-Hurd, M. Bye, E. R. Creswick, M. Boyd, M. Venigalla, Learning Applications,” in Real-Time Image Processing and Deep
E. Laforge, J. Purdy, P. Kamath, D. Maheshwari, M. Beidler, Learning 2022, N. Kehtarnavaz and M. F. Carlsohn, Eds., vol. 12102,
G. Rosseel, O. Ahmad, G. Gagarin, R. Czekalski, A. Rane, S. Parmar, International Society for Optics and Photonics. SPIE, 2022, pp.
J. Werner, J. Sproch, A. Macias, and B. Kurtz, “Think Fast: A 47–52. [Online]. Available: https://doi.org/10.1117/12.2622390
Tensor Streaming Processor (TSP) for Accelerating Deep Learning [87] T. P. Morgan, “Drilling Into Microsoft’s BrainWave Soft Deep Learning
Workloads,” in 2020 ACM/IEEE 47th Annual International Symposium Chip,” aug 2017. [Online]. Available: https://www.nextplatform.com/
on Computer Architecture (ISCA), may 2020, pp. 145–158. [Online]. 2017/08/24/drilling-microsofts-brainwave-soft-deep-leaning-chip/
Available: https://doi.org/10.1109/ISCA45697.2020.00023 [88] S. Ward-Foxton, “Mythic Resizes its AI Chip,” jun 2021. [Online].
[65] S. Ward-Foxton, “Gyrfalcon Unveils Fourth AI Accelerator Chip — Available: https://www.eetimes.com/mythic-resizes-its-analog-ai-chip/
EE Times,” nov 2019. [Online]. Available: https://www.eetimes.com/ [89] N. Hemsoth, “A Mythic Approach to Deep Learning Inference,” aug
gyrfalcon-unveils-fourth-ai-accelerator-chip/ 2018. [Online]. Available: https://www.nextplatform.com/2018/08/23/
[66] “SolidRun, Gyrfalcon Develop Arm-based Edge Op- a-mythic-approach-to-deep-learning-inference/
timized AI Inference Server,” feb 2020. [Online]. [90] D. Fick, “Mythic @ Hot Chips 2018,” aug 2018. [Online]. Available:
Available: https://www.hpcwire.com/off-the-wire/solidrun-gyrfalcon- https://medium.com/mythic-ai/mythic-hot-chips-2018-637dfb9e38b7
develop-edge-optimized-ai-inference-server/ [91] K. Freund, “NovuMind: An Early Entrant in AI Silicon,” Moor
[67] L. Gwennap, “Habana Offers Gaudi for AI Training,” Microprocessor Insights & Strategy, Tech. Rep., may 2019. [Online]. Available: https:
Report, Tech. Rep., jun 2019. [Online]. Available: https://habana.ai/wp- //moorinsightsstrategy.com/wp-content/uploads/2019/05/NovuMind-
content/uploads/2019/06/Habana-Offers-Gaudi-for-AI-Training.pdf An-Early-Entrant-in-AI-Silicon-By-Moor-Insights-And-Strategy.pdf
[68] E. Medina and E. Dagan, “Habana Labs Purpose-Built AI [92] J. Yoshida, “NovuMind’s AI Chip Sparks Controversy,” oct
Inference and Training Processor Architectures: Scaling AI Training 2018. [Online]. Available: https://www.eetimes.com/novuminds-ai-
Systems Using Standard Ethernet With Gaudi Processor,” IEEE chip-sparks-controversy/
Micro, vol. 40, no. 2, pp. 17–24, mar 2020. [Online]. Available: [93] T. P. Morgan, “Nvidia Rounds Out ”Ampere” Lineup
https://doi.org/10.1109/MM.2020.2975185 With Two New Accelerators,” apr 2021. [Online].
[69] L. Gwennap, “Habana Wins Cigar for AI Inference,” feb 2019. Available: https://www.nextplatform.com/2021/04/15/nvidia-rounds-
[Online]. Available: https://www.linleygroup.com/mpr/article.php?id= out-ampere-lineup-with-two-new-accelerators/
12103 [94] R. Krashinsky, O. Giroux, S. Jones, N. Stam, and S. Ramaswamy,
[70] S. Ward-Foxton, “Details of Hailo AI Edge Accelerator Emerge,” aug “NVIDIA Ampere Architecture In-Depth,” may 2020. [Online].
2019. [Online]. Available: https://www.eetimes.com/details-of-hailo- Available: https://devblogs.nvidia.com/nvidia-ampere-architecture-in-
ai-edge-accelerator-emerge/ depth/

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
9

[95] P. Alcorn, “Nvidia Infuses DGX-1 with Volta, Eight V100s in a Single [119] A. Shilov, “Tachyum Teases 128-Core CPU: 5.7 GHz,
Chassis,” may 2017. [Online]. Available: https://www.tomshardware. 950W, 16 DDR5 Channels,” jun 2022. [Online]. Avail-
com/news/nvidia-volta-v100-dgx-1-hgx-1,34380.html able: https://www.tomshardware.com/news/tachyum-teases-128-core-
[96] I. Cutress, “NVIDIA’s DGX-2: Sixteen Tesla V100s, 30TB cpu-57-ghz-950w-16-ddr5-channels
of NVMe, Only $400K,” mar 2018. [Online]. Avail- [120] L. Gwennap, “Tenstorrent Scales AI Performance: Architecture Leads
able: https://www.anandtech.com/show/12587/nvidias-dgx2-sixteen- in Data-Center Power Efficiency,” Microprocessor Report, Tech.
v100-gpus-30-tb-of-nvme-only-400k Rep., apr 2020. [Online]. Available: https://www.tenstorrent.com/wp-
[97] C. Campa, C. Kawalek, H. Vo, and J. Bessoudo, “Defining AI content/uploads/2020/04/Tenstorrent-Scales-AI-Performance.pdf
Innovation with NVIDIA DGX A100,” may 2020. [Online]. Available: [121] E. Talpes, D. D. Sarma, G. Venkataramanan, P. Bannon, B. McGee,
https://devblogs.nvidia.com/defining-ai-innovation-with-dgx-a100/ B. Floering, A. Jalote, C. Hsiong, S. Arora, A. Gorti, and G. S.
[98] R. Smith, “NVIDIA Hopper GPU Architecture and H100 Sachdev, “Compute Solution for Tesla’s Full Self-Driving Computer,”
Accelerator Announced: Working Smarter and Harder,” mar 2022. IEEE Micro, vol. 40, no. 2, pp. 25–35, mar 2020. [Online]. Available:
[Online]. Available: https://www.anandtech.com/show/17327/nvidia- https://doi.org/10.1109/MM.2020.2975764
hopper-gpu-architecture-and-h100-accelerator-announced [122] “FSD Chip - Tesla,” 2020. [Online]. Available: https://en.wikichip.
[99] ——, “NVIDIA Gives Jetson AGX Xavier a Trim, org/wiki/tesla (car company)/fsd chip
Announces Nano-Sized Jetson Xavier NX,” nov 2019. [123] S. Ward-Foxton, “TI’s First Automotive SoC with an AI Accelerator
[Online]. Available: https://www.anandtech.com/show/15070/nvidia- Launches,” feb 2021. [Online]. Available: https://www.eetimes.com/
gives-jetson-xavier-a-trim-announces-nanosized-jetson-xavier-nx tis-first-automotive-soc-with-an-ai-accelerator-launches/
[100] B. Funk, “NVIDIA Jetson AGX Orin: The Next-Gen Platform That [124] “TDA4VM Jacinto Processors for ADAS and Autonomous Vehicles,”
Will Power Our AI Robot Overlords Unveiled,” mar 2022. [Online]. Texas Instruments, Tech. Rep., mar 2021. [Online]. Available:
Available: https://hothardware.com/news/nvidia-jetson-agx-orin https://www.ti.com/lit/gpn/tda4vm
[101] “Jetson AGX Orin for Next-Gen Robotics,” 2022. [Online]. Avail- [125] M. Demler, “TI Jacinto Accelerates Level 3 ADAS,” mar
able: https://www.nvidia.com/en-us/autonomous-machines/embedded- 2020. [Online]. Available: https://www.linleygroup.com/newsletters/
systems/jetson-orin/ newsletter detail.php?num=6130&year=2020&tag=3
[102] D. Franklin, “NVIDIA Jetson TX2 Delivers Twice the Intelligence [126] R. Merritt, “Samsung, Toshiba Detail AI Chips,” feb 2019. [Online].
to the Edge,” mar 2017. [Online]. Available: https://developer.nvidia. Available: https://www.eetimes.com/samsung-toshiba-detail-ai-chips/
com/blog/jetson-tx2-delivers-twice-intelligence-edge/ [127] L. Gwennap, “Untether Delivers At-Memory AI,” Linley Group, Tech.
[103] B. Hill, “NVIDIA Unveils Ampere-Infused DRIVE AGX For Rep., nov 2020. [Online]. Available: https://www.linleygroup.com/
Autonomous Cars, Isaac Robotics Platform With BMW Partnership,” newsletters/newsletter detail.php?num=6230
may 2022. [Online]. Available: https://hothardware.com/news/nvidia- [128] G. Hilson, “Startup Tachyum Offers Universal Processor for
drive-agx-pegasus-orin-ampere-next-gen-autonomous-cars Evaluation,” jun 2022. [Online]. Available: https://www.eetimes.com/
[104] “NVIDIA Tesla P100.” [Online]. Available: https://www.nvidia.com/ startup-tachyum-offers-universal-processor-for-evaluation/
en-us/data-center/tesla-p100/
[129] N. Toon, “Introducing 2nd Generation IPU Systems for AI at
[105] R. Smith, “16GB NVIDIA Tesla V100 Gets Reprieve;
Scale,” jul 2020. [Online]. Available: https://www.graphcore.ai/posts/
Remains in Production,” may 2018. [Online]. Avail-
introducing-second-generation-ipu-systems-for-ai-at-scale
able: https://www.anandtech.com/show/12809/16gb-nvidia-tesla-v100-
[130] I. Lunden, “Graphcore Unveils New GC200 Chip and the
gets-reprieve-remains-in-production
Expandable M2000 IPU Machine That Runs on Them,”
[106] E. Kilgariff, H. Moreton, N. Stam, and B. Bell, “NVIDIA
jul 2020. [Online]. Available: https://techcrunch.com/2020/07/15/
Turing Architecture In-Depth,” sep 2018. [Online]. Available:
graphcore-second-generation-chip/
https://developer.nvidia.com/blog/nvidia-turing-architecture-in-depth/
[131] S. Ward-Foxton, “SambaNova Emerges From Stealth With
[107] “NVIDIA Tesla V100 Tensor Core GPU,” 2019. [Online]. Available:
Record-Breaking AI System,” dec 2020. [Online]. Avail-
https://www.nvidia.com/en-us/data-center/tesla-v100/
able: https://www.eetimes.com/sambanova-emerges-from-stealth-with-
[108] J. McGregor, “Perceive Exits Stealth With Super Efficient
record-breaking-ai-system/
Machine Learning Chip For Smarter Devices,” apr 2020. [Online].
Available: https://www.forbes.com/sites/tiriasresearch/2020/04/06/ [132] R. Prabhakar, S. Jairath, and J. L. Shin, “SambaNova SN10 RDU: A
perceive-exits-stealth-with-super-efficient-machine-learning-chip-for- 7nm Dataflow Architecture to Accelerate Software 2.0,” in 2022 IEEE
smarter-devices/#1b25ab646d9c International Solid- State Circuits Conference (ISSCC), vol. 65, 2022,
[109] D. Schor, “The 2,048-core PEZY-SC2 Sets a Green500 Record,” pp. 350–352.
nov 2017. [Online]. Available: https://fuse.wikichip.org/news/191/the- [133] R. Prabhakar and S. Jairath, “SambaNova SN10 RDU:Accelerating
2048-core-pezy-sc2-sets-a-green500-record/ Software 2.0 with Dataflow,” in 2021 IEEE Hot Chips 33 Symposium
[110] “MN-Core,” 2020. [Online]. Available: https://projects.preferred.jp/ (HCS), aug 2021, pp. 1–37.
mn-core/en/ [134] M. Emani, V. Vishwanath, C. Adams, M. E. Papka, R. Stevens,
[111] I. Cutress, “Preferred Networks: A 500 W Custom PCIe L. Florescu, S. Jairath, W. Liu, T. Nama, and . Sujeeth, “Accelerat-
Card using 3000 mm2 Silicon,” dec 2019. [Online]. Avail- ing Scientific Applications With SambaNova Reconfigurable Dataflow
able: https://www.anandtech.com/show/15177/preferred-networks-a- Architecture,” Computing in Science & Engineering, vol. 23, no. 2, pp.
500-w-custom-pcie-card-using-3000-mm2-silicon 114–119, 2021.
[112] D. Firu, “Quadric Edge Supercomputer,” Quadric, Tech. Rep., apr [135] O. Peckham, “Intel’s Habana Labs Unveils Gaudi2, Greco AI
2019. [Online]. Available: https://quadric.io/supercomputing.pdf Processors,” may 2022. [Online]. Available: https://www.hpcwire.com/
[113] S. Ward-Foxton, “Qualcomm Cloud AI 100 Promises Impressive 2022/05/10/intels-habana-labs-unveils-gaudi2-greco-ai-processors/
Performance per Watt for Near-Edge AI,” sep 2020. [136] T. P. Morgan, “Intel Pits New Gaudi2 AI Training
[Online]. Available: https://www.eetimes.com/qualcomm-cloud-ai- Engine Against Nvidia GPUs,” may 2022. [Online]. Avail-
100-promises-impressive-performance-per-watt-for-near-edge-ai/ able: https://www.nextplatform.com/2022/05/10/intel-pits-new-gaudi2-
[114] D. McGrath, “Qualcomm Targets AI Inferencing in the Cloud,” ai-training-engine-against-nvidia-gpus/
apr 2019. [Online]. Available: https://www.eetimes.com/qualcomm- [137] D. Martin, “Samsung, Others Test Esperanto’s 1,000-Core RISC-V
targets-ai-inferencing-in-the-cloud/# AI Chip,” apr 2022. [Online]. Available: https://www.theregister.com/
[115] “Rockchip Released Its First AI Processor RK3399Pro NPU 2022/04/22/samsung esperanto riscv/
Performance Up to 2.4TOPs,” jan 2018. [Online]. Available: https: [138] K. Freund, “Esperanto Launches AI Accelerator with over 1000 RISC-
//www.rock-chips.com/a/en/News/Press Releases/2018/0108/869.html V Cores,” aug 2021.
[116] L. Gwennap, “Machine Learning Moves to the Edge,” [139] O. Peckham, “Enter Dojo: Tesla Reveals Design for Modular
Microprocessor Report, Tech. Rep., apr 2020. [On- Supercomputer & D1 Chip,” aug 2021. [Online]. Avail-
line]. Available: https://www.linleygroup.com/uploads/sima-machine- able: https://www.hpcwire.com/2021/08/20/enter-dojo-tesla-reveals-
learning-moves-to-the-edge-wp.pdf design-for-modular-supercomputer-d1-chip/
[117] D. McGrath, “Tech Heavyweights Back AI Chip Startup,” [140] A. Shilov, “Via Shutters Centaur Technology Site, Sells Off Equip-
oct 2018. [Online]. Available: https://www.eetimes.com/tech- ment,” dec 2021. [Online]. Available: https://www.tomshardware.com/
heavyweights-back-ai-chip-startup/ news/via-sells-off-equipment-from-centaur-preps-to-shut-down-site
[118] R. Merritt, “Startup Rolls AI Chips for Audio,” feb 2018. [Online]. [141] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
Available: https://www.eetimes.com/startup-rolls-ai-chips-for-audio/ A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.
10

You Need,” CoRR, vol. abs/1706.0, 2017. [Online]. Available: [160] M. Barnell, C. Raymond, M. Wilson, D. Isereau, and C. Cicotta,
http://arxiv.org/abs/1706.03762 “Target Classification in Synthetic Aperture Radar and Optical Imagery
[142] J. Burt, “Chip Makers Press For Standardized FP8 Format For AI,” Using Loihi Neuromorphic Hardware,” in 2020 IEEE High Perfor-
jul 2022. [Online]. Available: https://www.nextplatform.com/2022/07/ mance Extreme Computing Conference (HPEC), 2020, pp. 1–6.
07/chip-makers-press-for-standardized-fp8-format-for-ai/ [161] A. Viale, A. Marchisio, M. Martina, G. Masera, and M. Shafique,
[143] B. Noune, P. Jones, D. Justus, D. Masters, and C. Luschi, “8-bit “CarSNN: An Efficient Spiking Neural Network for Event-Based
Numerical Formats for Deep Neural Networks,” ArXiv, jun 2022. Autonomous Cars on the Loihi Neuromorphic Research Processor,”
[Online]. Available: http://arxiv.org/abs/2206.02915 in 2021 International Joint Conference on Neural Networks (IJCNN),
[144] K. Rocki, D. van Essendelft, I. Sharapov, R. Schreiber, M. Morrison, jul 2021, pp. 1–10.
V. Kibardin, A. Portnoy, J. F. Dietiker, M. Syamlal, and M. James, [162] S. Ward-Foxton, “Innatera Unveils Neuromorphic AI Chip
“Fast Stencil-Code Computation on a Wafer-Scale Processor,” in arXiv, to Accelerate Spiking Networks,” jul 2021. [Online].
2020. Available: https://www.eetimes.com/innatera-unveils-neuromorphic-ai-
[145] T. Louw and S. Mcintosh-Smith, “Using the Graphcore IPU for Tra- chip-to-accelerate-spiking-networks/
ditional HPC Applications,” in 3rd Workshop on Accelerated Machine [163] M. Levy, “Innatera’s Spiking Neural Processor,” apr 2021. [Online].
Learning (AccML), jan 2021. Available: https://www.linleygroup.com/newsletters/newsletter detail.
[146] Z. Jia, B. Tillman, M. Maggioni, and D. P. Scarpazza, “Dissecting php?num=6302&year=2021&tag=3
the Graphcore IPU Architecture via Microbenchmarking,” Citadel, [164] V. Ostrovskii, P. Fedoseev, Y. Bobrova, and D. Butusov,
Chicago, Tech. Rep., dec 2019. [Online]. Available: https://arxiv.org/ “Structural and Parametric Identification of Knowm Memristors,”
abs/1912.03413v1 Nanomaterials, vol. 12, no. 1, jan 2022. [Online]. Avail-
[147] R. L. Hu, D. Pierce, Y. Shafi, A. Boral, V. Anisimov, S. Nevo, and Y.-f. able: /pmc/articles/PMC8746671//pmc/articles/PMC8746671/?report=
Chen, “Accelerating physics simulations with tensor processing units: abstracthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8746671/
An inundation modeling example,” The International Journal of High [165] S. Ward-Foxton, “Optical Compute Promises Game-
Performance Computing Applications, vol. 36, no. 4, pp. 510–523, Changing AI Performance,” aug 2020. [Online]. Available:
2022. [Online]. Available: https://doi.org/10.1177/10943420221102873 https://www.eetimes.com/optical-compute-promises-game-changing-
[148] A. G. M. Lewis, J. Beall, M. Ganahl, M. Hauru, S. B. Mallick, ai-performance/?utm source=eetimes&utm medium=networksearch
and G. Vidal, “Large Scale Distributed Linear Algebra With Tensor [166] ——, “Optical Chip Solves Hardest Math Problems Faster than GPUs,”
Processing Units,” arXiv preprint, dec 2021. [Online]. Available: dec 2021. [Online]. Available: https://www.eetimes.com/optical-
https://arxiv.org/abs/2112.09017v1 computing-chip-runs-hardest-math-problems-100x-faster-than-gpus/
[167] J. Launay, I. Poli, K. Müller, I. Carron, L. Daudet, F. Krzakala,
[149] P. Sharma and V. Jadhao, “Molecular Dynamics Simulations on Cloud
and S. Gigan, “Light-in-the-Loop: Using a Photonics Co-Processor
Computing and Machine Learning Platforms,” arXiv preprint, 2021.
for Scalable Training of Neural Networks,” arXiv preprint, jun 2020.
[150] T. Lu, T. Marin, Y. Zhuo, Y.-F. Chen, and C. Ma, “Nonuniform
[Online]. Available: https://arxiv.org/abs/2006.01475v2
Fast Fourier Transform on Tpus,” in 2021 IEEE 18th International
[168] E. Cottle, F. Michel, J. Wilson, N. New, and I. Kundu, “Optical
Symposium on Biomedical Imaging (ISBI), 2021, pp. 783–787.
Convolutional Neural Networks – Combining Silicon Photonics and
[151] T. Lu, Y. F. Chen, B. Hechtman, T. Wang, and J. Anderson,
Fourier Optics for Computer Vision,” arXiv preprint, dec 2020.
“Large-Scale Discrete Fourier Transform on TPUs,” IEEE Access,
[Online]. Available: https://arxiv.org/abs/2103.09044v1
vol. 9, pp. 93 422–93 432, feb 2020. [Online]. Available: https:
[169] J. Wilson, “The Multiply and Fourier Transform Unit: A Micro-Scale
//arxiv.org/abs/2002.03260v3
Optical Processor,” Optalysys, Tech. Rep., dec 2020. [Online].
[152] T. Lu, T. Marin, Y. Zhuo, Y. F. Chen, and C. Ma, “Accelerating MRI Available: https://optalysys.com/s/Multiply and Fourier Transform
Reconstruction on TPUs,” 2020 IEEE High Performance Extreme white paper 12 12 20.pdf
Computing Conference (HPEC 2020), sep 2020. [Online]. Available: [170] D. Schneider, “A Neural-Net Based on Light Could Best Digital
https://arxiv.org/abs/2006.14080v1 Computers,” jun 2019. [Online]. Available: https://spectrum.ieee.org/
[153] F. Belletti, D. King, K. Yang, R. Nelet, Y. Shafi, Y.-F. Chen, a-neural-net-based-on-light-could-best-digital-computers
and J. Anderson, “Tensor Processing Units for Financial Monte [171] C. Q. Choi, “Photonic Chip Performs Image Recognition at the Speed
Carlo,” in Proceedings of the 2020 SIAM Conference on Parallel of Light,” jun 2022. [Online]. Available: https://spectrum.ieee.org/
Processing for Scientific Computing. Society for Industrial and photonic-neural-network
Applied Mathematics, jun 2019, pp. 12–23. [Online]. Available:
https://arxiv.org/abs/1906.02818v5
[154] K. Yang, Y. F. Chen, G. Roumpos, C. Colby, and J. Anderson, “High
performance Monte Carlo simulation of ising model on TPU clusters,”
in International Conference for High Performance Computing,
Networking, Storage and Analysis, SC. IEEE Computer Society, nov
2019. [Online]. Available: https://arxiv.org/abs/1903.11714v4
[155] C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean,
G. S. Rose, and J. S. Plank, “A Survey of Neuromorphic Computing
and Neural Networks in Hardware,” arXiv preprint arXiv:1705.06963,
may 2017. [Online]. Available: http://arxiv.org/abs/1705.06963
[156] C. D. James, J. B. Aimone, N. E. Miner, C. M. Vineyard,
F. H. Rothganger, K. D. Carlson, S. A. Mulder, T. J. Draelos,
A. Faust, M. J. Marinella, J. H. Naegle, and S. J. Plimpton,
“A Historical Survey of Algorithms and Hardware Architectures
for Neural-inspired and Neuromorphic Computing Applications,”
Biologically Inspired Cognitive Architectures, vol. 19, pp. 49–64,
jan 2017. [Online]. Available: https://www.sciencedirect.com/science/
article/abs/pii/S2212683X16300561
[157] R. F. Service, “Microchips That Mimic the Human Brain Could
Make AI Far More Energy Efficient,” may 2022. [Online].
Available: https://www.science.org/content/article/microchips-mimic-
human-brain-could-make-ai-far-more-energy-efficient
[158] G. Orchard, E. P. Frady, D. B. D. Rubin, S. Sanborn, S. B. Shrestha,
F. T. Sommer, and M. Davies, “Efficient Neuromorphic Signal Pro-
cessing with Loihi 2,” in 2021 IEEE Workshop on Signal Processing
Systems (SiPS), oct 2021, pp. 254–259.
[159] M. Davies, A. Wild, G. Orchard, Y. Sandamirskaya, G. A. F. Guerra,
P. Joshi, P. Plank, and S. R. Risbud, “Advancing Neuromorphic
Computing With Loihi: A Survey of Results and Outlook,” Proceedings
of the IEEE, vol. 109, no. 5, pp. 911–934, may 2021.

Authorized licensed use limited to: ETH BIBLIOTHEK ZURICH. Downloaded on July 03,2023 at 03:04:40 UTC from IEEE Xplore. Restrictions apply.

Edexcel Gcse 9 1 Business Revision Guide
50% (2)
Edexcel Gcse 9 1 Business Revision Guide
23 pages
Artificial Intelligence Hardware Design - Challenges and Solutions
100% (2)
Artificial Intelligence Hardware Design - Challenges and Solutions
233 pages
AI For Data Science - Artificial Intelligence Frameworks and Functionality For Deep Learning, Optimization, and Beyond
No ratings yet
AI For Data Science - Artificial Intelligence Frameworks and Functionality For Deep Learning, Optimization, and Beyond
231 pages
Process Control Philosophy
100% (3)
Process Control Philosophy
30 pages
F18 Pocket Guide PDF
No ratings yet
F18 Pocket Guide PDF
1 page
A Small List of Operating Oil and Gas Fields in Myanmar by Production Level
No ratings yet
A Small List of Operating Oil and Gas Fields in Myanmar by Production Level
6 pages
Titan Company Ltd.
100% (1)
Titan Company Ltd.
59 pages
AI_and_ML_Accelerator_Survey_and_Trends (2)
No ratings yet
AI_and_ML_Accelerator_Survey_and_Trends (2)
10 pages
transforming-edge-ai-with-npus-in-microcontrollers
No ratings yet
transforming-edge-ai-with-npus-in-microcontrollers
12 pages
Understanding AI Part 2 Inference, Revised
No ratings yet
Understanding AI Part 2 Inference, Revised
4 pages
Ready To Scale Ai Idc 88025788USEN
No ratings yet
Ready To Scale Ai Idc 88025788USEN
17 pages
Intel Artificial Intelligence Eguide
No ratings yet
Intel Artificial Intelligence Eguide
15 pages
1 s2.0 S2095809921003349 Main
No ratings yet
1 s2.0 S2095809921003349 Main
13 pages
white_paper_Edge_AI_SMCI_NVIDIA
No ratings yet
white_paper_Edge_AI_SMCI_NVIDIA
11 pages
the-ai-pc-opportunity-white-paper
No ratings yet
the-ai-pc-opportunity-white-paper
8 pages
HCIA-Intelligent Computing V1.0 Training Material
No ratings yet
HCIA-Intelligent Computing V1.0 Training Material
316 pages
2411.13717v2
No ratings yet
2411.13717v2
38 pages
It Tech Trends 2024 Report v5
No ratings yet
It Tech Trends 2024 Report v5
85 pages
1-s2.0-S2772949425000026-main
No ratings yet
1-s2.0-S2772949425000026-main
21 pages
Nvidia insideBigData Guide To Deep Learning and AI PDF
No ratings yet
Nvidia insideBigData Guide To Deep Learning and AI PDF
9 pages
AI-HPC Is Happening Now
No ratings yet
AI-HPC Is Happening Now
16 pages
AI Computing Trends - Challenges Innovations-Final
No ratings yet
AI Computing Trends - Challenges Innovations-Final
18 pages
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
No ratings yet
Embedded_Deep_Learning_Accelerators_A_Survey_on_Recent_Advances
19 pages
Artificial Intelligence The Time To Act Is Now - McKinsey
No ratings yet
Artificial Intelligence The Time To Act Is Now - McKinsey
12 pages
4 Novel Computing Approaches For The Next Era of AI - CB Insights Research
No ratings yet
4 Novel Computing Approaches For The Next Era of AI - CB Insights Research
6 pages
01-AI-Trends-Report-2023
No ratings yet
01-AI-Trends-Report-2023
15 pages
Engineering: Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, Tianqi Tang
No ratings yet
Engineering: Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, Tianqi Tang
11 pages
Futureinternet 12 00113 v2
No ratings yet
Futureinternet 12 00113 v2
22 pages
A_Survey_on_Neural_Network_Hardware_Accelerators
No ratings yet
A_Survey_on_Neural_Network_Hardware_Accelerators
22 pages
LfmbtlGVQyaCZXCAP8gg - IDC - Why Developing and Deploying AI Technology On Workstations Makes Sense
No ratings yet
LfmbtlGVQyaCZXCAP8gg - IDC - Why Developing and Deploying AI Technology On Workstations Makes Sense
12 pages
V Ersion: A Survey On Deep Learning Hardware Accelerators For Heterogeneous HPC Platforms
No ratings yet
V Ersion: A Survey On Deep Learning Hardware Accelerators For Heterogeneous HPC Platforms
58 pages
ThoughtSpot SpotIQ AI Driven Analytics White Paper PDF
100% (1)
ThoughtSpot SpotIQ AI Driven Analytics White Paper PDF
20 pages
AI Accelerator
No ratings yet
AI Accelerator
5 pages
IDC White Paper
No ratings yet
IDC White Paper
20 pages
Advanced Optical Materials Template
No ratings yet
Advanced Optical Materials Template
33 pages
MCHP-UK-MEL3272-AI Trends-190889 Final
No ratings yet
MCHP-UK-MEL3272-AI Trends-190889 Final
10 pages
1 - Ai-Ml-Dl-Hpc
No ratings yet
1 - Ai-Ml-Dl-Hpc
11 pages
Artificial Intelligence For Enterprise Applications - Resume - Tractica - SIAG
No ratings yet
Artificial Intelligence For Enterprise Applications - Resume - Tractica - SIAG
13 pages
Download full Artificial Intelligence Hardware Design: Challenges and Solutions 1st Edition Albert Chun-Chen Liu ebook all chapters
100% (4)
Download full Artificial Intelligence Hardware Design: Challenges and Solutions 1st Edition Albert Chun-Chen Liu ebook all chapters
40 pages
10224582
No ratings yet
10224582
4 pages
Download full High Performance Vision Intelligence: Recent Advances Aparajita Nanda ebook all chapters
100% (14)
Download full High Performance Vision Intelligence: Recent Advances Aparajita Nanda ebook all chapters
65 pages
Ai ML On Cpu Whitepaper PDF
No ratings yet
Ai ML On Cpu Whitepaper PDF
10 pages
AI Notes Module 1
No ratings yet
AI Notes Module 1
14 pages
Digital Twin - Old Wine in A New Bottle
No ratings yet
Digital Twin - Old Wine in A New Bottle
20 pages
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Hardware Accelerator Systems For Artificial Intelligence and Machine Learning
21 pages
ssrn_id4626736_code2177801
No ratings yet
ssrn_id4626736_code2177801
16 pages
Accelerating Artificial Intelligence Innovation with Concurrent Design Engineering
No ratings yet
Accelerating Artificial Intelligence Innovation with Concurrent Design Engineering
26 pages
Systolic Array Architecture For Educational Use
No ratings yet
Systolic Array Architecture For Educational Use
6 pages
Lesson2 Huawei Ascend Platform Introduction EXTERNAL
No ratings yet
Lesson2 Huawei Ascend Platform Introduction EXTERNAL
40 pages
STATISTA in Depth Report AI Copia 4
No ratings yet
STATISTA in Depth Report AI Copia 4
207 pages
What is AI
No ratings yet
What is AI
5 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
26 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
w1--Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
No ratings yet
w1--Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
19 pages
Vermesan O. Industrial Artificial Intelligence Tech and App 2022
No ratings yet
Vermesan O. Industrial Artificial Intelligence Tech and App 2022
244 pages
IDC Generative AI Strategies - 2024 Aug
No ratings yet
IDC Generative AI Strategies - 2024 Aug
1 page
DNN Accelerators For Heterogeneous HPC
No ratings yet
DNN Accelerators For Heterogeneous HPC
53 pages
TheFutureofAI
No ratings yet
TheFutureofAI
9 pages
Voulgaris, Bulut - AI For Data Science (AVG) (2018)
No ratings yet
Voulgaris, Bulut - AI For Data Science (AVG) (2018)
202 pages
L-0017398760-pdf
No ratings yet
L-0017398760-pdf
24 pages
NVIDIA Investor Presentation Oct 2024
No ratings yet
NVIDIA Investor Presentation Oct 2024
30 pages
Ghosh_2020_J._Phys.__Conf._Ser._1531_012045
No ratings yet
Ghosh_2020_J._Phys.__Conf._Ser._1531_012045
11 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Introduction to Machine Learning and Neural Classification
From Everand
Introduction to Machine Learning and Neural Classification
Trilokesh Khatri
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Variasi Bahasa Dalam Sosial Media: Sebuah Konstruksi Identitas
No ratings yet
Variasi Bahasa Dalam Sosial Media: Sebuah Konstruksi Identitas
8 pages
The Pollination Mechanism in Trigonidium
No ratings yet
The Pollination Mechanism in Trigonidium
7 pages
CORPORATE GOVERNANCE Commitees
No ratings yet
CORPORATE GOVERNANCE Commitees
39 pages
OKI C9600 C9800 Parts Manual
67% (3)
OKI C9600 C9800 Parts Manual
38 pages
Polymer Chemistry PDF
No ratings yet
Polymer Chemistry PDF
31 pages
TM103 All Slides
No ratings yet
TM103 All Slides
268 pages
HM UNWTO TedQual Manual
No ratings yet
HM UNWTO TedQual Manual
66 pages
Gurugram C4C GMDA Final
No ratings yet
Gurugram C4C GMDA Final
13 pages
UAE Customs Magazine - Issue 8
No ratings yet
UAE Customs Magazine - Issue 8
13 pages
SGA7126P - Manual Utilizator PDF
No ratings yet
SGA7126P - Manual Utilizator PDF
116 pages
M Organizational Behavior 2nd Edition McShane Solutions Manual Download
100% (17)
M Organizational Behavior 2nd Edition McShane Solutions Manual Download
27 pages
Risk Assessment - Erecting Scaffolding
No ratings yet
Risk Assessment - Erecting Scaffolding
7 pages
Chapter 3 Revenue From Contracts With Customers
No ratings yet
Chapter 3 Revenue From Contracts With Customers
4 pages
Ethereal Tutorial PDF
No ratings yet
Ethereal Tutorial PDF
2 pages
Loreal S
No ratings yet
Loreal S
3 pages
Ebook PDF Criminal Investigation An Illustrated Case Study Approach PDF
100% (50)
Ebook PDF Criminal Investigation An Illustrated Case Study Approach PDF
41 pages
Tutorial - Coin Flipping
0% (1)
Tutorial - Coin Flipping
11 pages
NI CaseStudy Cs 16265
No ratings yet
NI CaseStudy Cs 16265
3 pages
Curriculum Tables
No ratings yet
Curriculum Tables
4 pages
Protection To Substructure-Method Statement
No ratings yet
Protection To Substructure-Method Statement
4 pages
Read 31
No ratings yet
Read 31
25 pages
Darwin The Geologist: Laporte@cats - Ucsc.edu
100% (1)
Darwin The Geologist: Laporte@cats - Ucsc.edu
3 pages
ECU-Magazine_July-2024
No ratings yet
ECU-Magazine_July-2024
16 pages
C++ Task Vector
No ratings yet
C++ Task Vector
4 pages
Worksheet No. 2 - Ch-2-Polynomials
No ratings yet
Worksheet No. 2 - Ch-2-Polynomials
2 pages