0% found this document useful (0 votes)

82 views

We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists

Uploaded by

Pramod Reddy R

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists

Uploaded by

Pramod Reddy R

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

We are IntechOpen,

the world’s leading publisher of

Open Access books
Built by scientists, for scientists

3,500
Open access books available
108,000
International authors and editors
1.7 M
Downloads

Our authors are among the

151
Countries delivered to
TOP 1%
most cited scientists
12.2%
Contributors from top 500 universities

Selection of our books indexed in the Book Citation Index

in Web of Science™ Core Collection (BKCI)

Interested in publishing with us?

Contact [email protected]
Numbers displayed above are based on latest data collected.
For more information visit www.intechopen.com
Chapter 1

Hardware Accelerator Design for Machine Learning

Li Du and Yuan Du

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.72845

Abstract
Machine learning is widely used in many modern artificial intelligence applications. Various
hardware platforms are implemented to support such applications. Among them, graphics
processing unit (GPU) is the most widely used one due to its fast computation speed and
compatibility with various algorithms. Field programmable gate arrays (FPGA) show better
energy efficiency compared with GPU when computing machine learning algorithm at the
cost of low speed. Finally, various application specific integrated circuit (ASIC) architecture
is proposed to achieve the best energy efficiency at the cost of less reconfigurability which
makes it suitable for special kinds of machine learning algorithms such as a deep convo-
lutional neural network. Finally, analog computing shows a promising methodology to
compute large-sized machine learning algorithm due to its low design cost and fast com-
puting speed; however, due to the requirement of the analog-to-digital converter (ADC) in
the analog computing, this kind of technique is only applicable to low computation resolu-
tion, making it unsuitable for most artificial intelligence (AI) applications.

Keywords: machine learning, hardware accelerator, model compression, analog

computing, GPU, FPGA, ASIC

1. Introduction

Machine learning (ML) is currently widely used in many modern artificial intelligence (AI)
applications [1]. The breakthrough of the computation ability has enabled the system to
compute complicated different ML algorithm in a relatively short time, providing real-time
human-machine interaction such as face detection for video surveillance, advanced driver-
assistance systems (ADAS), and image recognition early cancer detection [2, 3]. Among all
those applications, a high detection accuracy requires complicated ML computation, which
comes at the cost of high computational complexity. This results in a high requirement on the
hardware platform. Currently, most applications are implemented on general-purpose com-
pute engines, especially graphics processing units (GPUs). However, work recently reported

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.
2 Machine Learning - Advanced Techniques and Emerging Applications

from both industry and academy shows a trend on the design of application specific integrated
circuit (ASIC) for ML, especially in the field of deep neural network (DNN). This chapter gives
an overview of the hardware accelerator design, the various types of the ML acceleration, and
the technique used in improving the hardware computation efficiency of ML computation.

2. Recent development on deep learning hardware accelerator

2.1. GPU/FPGA-based accelerator in datacenter

Over the past decades, graphics processing units (GPUs) have become popular and standard
in training deep-learning algorithms or convolutional neural networks for face, object detec-
tion/recognition, data mining, and other artificial intelligence (AI) applications. GPUs offer a
wide range of hardware selections, a high-performance throughput/computing power, and a
stable but ever-expanding ecosystem. The GPU architecture is usually implemented with
several mini graphics processors. Each graphics processor has its own computation unit and
local cache which fits for the matrix multiplication. A shared high-speed bus is included in
multiple mini processors to enable fast data exchange among mini processors. In addition, it
also acts as a bridge to connect the main CPU and multiple mini graphics processors.
Taking NVIDIA’s DGX-1 as an example [4], DGX-1 has eight Tesla P100-SXM2 GPUs
conforming to Pascal architecture. Each GPU has 56 multiprocessors with 64 CUDA cores per
multiprocessor. This makes each GPU equipped with 3584 CUDA cores. The GPU and memory
clock frequencies are 1.3 GHz and 700 MHz, respectively. The GPU has 4096-bit memory bus
width, 16 GB global memory, and 4 MB L2 cache. Figure 1 shows the system-level topology of
DGX-1. The network of NVLink interconnect is wired so that any two GPUs can hop away from
less than one another GPU. The GPU cluster is connected to a switch (PLX) with a PCIe 16

Figure 1. Diagram of NVIDIA DGX-1 system-level topology.

Hardware Accelerator Design for Machine Learning 3
http://dx.doi.org/10.5772/intechopen.72845

interconnect. The maximum bandwidth of NVLink interconnect with Tesla P100 is reported at
160 GB/s. In a clustering or multicore parallel computation scenario, the communication inter-
connect performance becomes the bottleneck to achieving high throughput, low latency, and
high energy efficiency. Figure 2(a) and (b) shows that DGX-1 GPU outperforms comparable Intel
CPU (KNL) in power efficiency and computing throughput for two different batch sizes when
running CLfarNet.
The GPU offers significant computation speed due to a lot of parallel processing cores. How-
ever, a relatively large power consumption is also requested for the computation and data
movement. In addition, a high-speed interconnect interface is required to support the fast data
exchange. Thus, compared with other techniques, GPU offers power computation ability at the
expense of high design cost (unit price) and power consumption.

As the industry matures, field programmable gate arrays (FPGAs) are now starting to emerge as
credible competition to GPUs for implementing CNN-based deep learning algorithms. Microsoft

Figure 2. Power and performance of CifarNet/Cifar 10 with batch sizes (a) 96 and (b) 192.
4 Machine Learning - Advanced Techniques and Emerging Applications

Research’s Catapult Project garnered quite a bit of attention in the industry when it contended
that using FPGAs could be as much as 10 times more power efficient compared to GPUs [5].
Although the performance of single FPGA was much lower than comparable-price GPUs, the
fact that power consumption was much lower could have significant implications for many
applications where high performance may not be the top priority. Figure 3(a) shows a logical
view of FPGAs in cloud-scale application and Figure 3(b) shows how the FPGA-based acceler-
ator fits into a host server.
As Figure 3(b) shows, the FPGA-based machine learning accelerator typically involves hardware
blocks such as DRAM, CPUs, network interface controller (NIC), and FPGAs. The DRAMs act as
a large buffer to store the temporary data while the CPU is in charge of managing the computa-
tion, including sending instructions to FPGAs. The FPGA is programmed to fit the ML algo-
rithm. Since the ML algorithm is optimized at a hardware level through FPGA programming, a
high data access efficiency is obtained compared with regular GPU computation which does not
have any hardware optimization on the corresponding ML algorithms.
Although the FPGA reduces the power consumption in computing through optimizing the ML
algorithms on the hardware design, the overall efficiency is still much lower compared with the
ASIC for single kind of algorithms. Compared with the ASIC, the programmability introduced by
the FPGA also brings complicated logic which increases the hardware design cost. In addition, the
speed of the FPGA is usually limited to 300 MHz, which is 4–5 times lower than a typical ASIC [6].

2.2. ASIC-based CNN accelerator at edge

2.2.1. Introduction

In the HPC or datacenter, hardware accelerator solutions are dominated by GPU and FPGA
solution. State-of-the-art machine-learning computation mostly relies on the cloud servers.

Figure 3. (a) De-couples programmable hardware plane, (b) server plus FPGA schematic.
Hardware Accelerator Design for Machine Learning 5
http://dx.doi.org/10.5772/intechopen.72845

However, high-power consumption makes this approach limited in many real application
scenarios. Since cloud-based AI applications on portable devices require network connection
capability, the quality of network connection affects user experience. Furthermore, the network
and communication latency is not acceptable for real-time AI applications. In addition, most of
IoT AI applications have a strict power and cost constrain, which could support neither high-
power GPU nor transmitting a large amount of data to cloud servers.

To address the abovementioned issues, several edge-based AI processing schemes were intro-
duced in [7–9]. The edge-based AI processing scheme targets utilizing the localized data at the
edge side and avoids network communication overhead. Currently, most localized AI pro-
cessors focus on processing convolutional neural network (CNN) which is widely used for
computation vision algorithms and requests a lot of computing resources.

2.2.2. CNN accelerator layer function definition

The state-of-art convolutional neural networks commonly include three different computa-
tional layers: convolution layer, pooling layer, and fully connected layer. Convolution layer
is the most computation intensive part of the neural network, with pooling layer inserted
between two convolution layers with the function of reducing intermediate data size and
remapping feature maps. Fully connected layer is usually the last layer of the CNN to predict
labels of input data, which is memory bandwidth limited, rather than computation resource
limited.

The primary role of a convolution layer is to apply convolution function to map the input
(previous) layer’s images to the next layer. Data from each input layer are composed of
multiple channels as a three-dimensional tensor. One set of regional filter windows is defined
as one filter or weight. The results run through inner product computation by the filter weight
and input data. Output feature is defined by using the filter or weight to scan and accumulate
different input channels. After interproduct computation, a separated bias vector (the same
dimension as output feature number) will be added in each final result. The analytical repre-
sentation of convolution layer is shown in Eq. (1) and Figure 4.

X K
K X
M X
O½o½m½x½y ¼ B½o þ I ½o½k½αx þ i½αy þ j W ½m½k½i½j
k¼1 i¼1 j¼1

1 ≤ o ≤ N, 1 ≤ m ≤ M, 1 ≤ x, y ≤ So (1)

O, B, I, and W are the output features, biases, input features, and filters, respectively.

In addition to the convolution layer, pooling layer is to compress important information

through a group of local image pixel data in each input channel. There are two types of pooling
operations: max pooling and average pooling. For max pooling operation, the output of
pooling layer collects the maximum of pixel data in the local group window, while for average
pooling operation, the output of pooling layers calculates the mean of pixel data in the local
group window. The representations of these two pooling operations are defined as Eqs. (2) and (3).
Figure 5 is an example of the max pooling function.
6 Machine Learning - Advanced Techniques and Emerging Applications

Figure 4. Concept of computation of CONV layer.

2 3
I ½r½c ⋯ I ½r½c þ K 1
Oavg ½r½c ¼ avg4 ⋮ ⋱ ⋮ (2)
6 7
5
I ½r þ K 1½c ⋯ I ½r þ k 1½c þ K 1
2 3
I ½r½c ⋯ I ½r½c þ K 1
Omax ½r½c ¼ max4 ⋮ ⋱ ⋮ (3)
6 7
5
I ½r þ K 1½c ⋯ I ½r þ k 1½c þ K 1

Here I[r][c] represents the input channel’s data at the position (r,c) and the kernel size of the
pooling window is K.

2.2.3. CNN accelerator architecture overview

Today’s CNN accelerator architecture can mainly separate into two categories. The central
computation architecture and the sparse computation architecture. Figure 6 is a typical central

Figure 5. Example of computation of a max pooling layer.

Hardware Accelerator Design for Machine Learning 7
http://dx.doi.org/10.5772/intechopen.72845

Figure 6. Central computation architecture of the CNN accelerator.

computation architecture that reports in 2015 [10]. The central computation architecture has
one large PE array. Multiple filters will be sent out into the PE array to enable parallel
computation. The output result of each filter will be gathered at the PE array’s output to
feedback to the memory for next layer computation. This large PE array in the central compu-
tation architecture provides a benefit to computing large kernel-sized CNN; however, it needs
to reconstruct the array when computing the small kernel-sized CNN.
On the other hand, a sparse computation architecture is made of many parallel small convolu-
tion units that fit for small-sized kernel [11]. Figure 7 is one of such implementations. The
computing unit (CU) Engine Array is made of 16 3 3 kernel-sized convolution units. It
provides a benefit to compute small kernel-sized convolution operations and simplify the data
flow. However, the computing unit is only supported for 3 3 convolution. So when comput-
ing a kernel size that is larger than 3 3, a kernel decomposition technique is proposed in the
following section.

2.2.4. Kernel decomposition technique

The filter’s kernel size in a typical CNN network can range from a very small size (1 1) to a very
large size (11 11). A hardware engine needs design to support various sized convolutional
operation. However, for sparse architecture, the computation units are not separated into many
small blocks. Each block consists of a small-sized processing engine array and can only support
small-sized convolution, making each block hard to process large convolution. To minimize the
hardware resource usage, a filter decomposition algorithm is proposed to compute any large
kernel-sized (>3 3) convolution through using only 3 3-sized CU [11]. The algorithm is
separated into three steps: (1) It first examines the kernel size of the filter. If the original filter’s
kernel size is not an exact multiple of three, zero padding weights will be added in the original
filter’s kernel boundary to extend the original filter’s kernel size to be a multiple of three. The
added weights are all zero to keep the extended filter convolution result to be same as the original
one. (2) The extended filters will be decomposed into several 3 3-sized filters. Each filter will be
8 Machine Learning - Advanced Techniques and Emerging Applications

Figure 7. Sparse computation architecture of the CNN accelerator in [11].

assigned a shift address based on its top left weight’s relative position in the original filter and
each decomposed filter will be computed individually. (3) The output result of each decomposed
filter will be summed together based on its shift address to generate the final output. The
mathematical derivation of this decomposition technique is also explained in [11].
Figure 8 is an example of decomposing a 5 5 filter into four 3 3 filters using this technique.
One row and column zero padding are added in the original filter. The decomposed filters F0,
F1, F2, F3’s shift address are (0,0), (0,3), (3,0), (3,3). Figure 9 shows the detailed procedure.

2.3. Model compression

In addition to the hardware architecture level development, model compression is also reported as
a way to improve the hardware computation efficiency of the machine learning. Ref [12] reported a
methodology to prune the neural network and achieve up to 35 to 49 model parameters
reduction. The procedure is shown in Figure 10. The original network will be pruned and
retrained several times to achieve parameters reduction. After that, quantization is implemented
with clustered weights to achieve additional parameter size reduction. Finally, Huffman encoding
is added into the final weights to achieve further model size reduction.
Hardware Accelerator Design for Machine Learning 9
http://dx.doi.org/10.5772/intechopen.72845

Figure 8. A 5 5 Filter decomposed into four 3 3 sub-filter.

Figure 9. Filter decomposition technique to compute a 5 5 filter on the 7 7 image. The 5 5 filter is decomposed into
four separated 3 3 filters F0, F1, F2, F3, and generating four sub-images. The sub-images are summed together to
generate the final output. Same color’s pixels in each sub-image will be added together to generate the corresponding
pixels in the output image.

Figure 10. Neural network compression reported in Ref [12].

10 Machine Learning - Advanced Techniques and Emerging Applications

Due to the rapid increment of the deep learning model size, model compression becomes more
and more important for machine-learning hardware acceleration, especially for the edge-side
user case. In addition, the fixed-point data format is also used in many deep learning applica-
tions to reduce the computation cost [13].

2.4. Analog computing

In addition to the traditional digital accelerator design, analog computing is also becom-
ing one of the trends to improve the processor computation ability in solving machine
learning problems. Here, we use the charge-trapping transistors (CTTs) technique as an
example to introduce analog computing [14]. The complementary metal oxide semicon-
ductor (CMOS)-compatible feature of the CTTs makes them very promising devices to
implement large-sized computation using analog methodology.
As the scaling of transistors is reaching its manufacturing limit, the computation through-
put using current architectures will also inevitably saturate. Recent research reports the
development of analog computing engines. Compared to traditional digital computation,
analog computing shows tremendous advantages regarding the power, design cost, and
computation speed. Among many analog computing systems, memristor-based ones have
been widely reported [14]. Recently, more promising charge-trapping transistors (CTTs)

Figure 11. A schematic showing the basic operation of CTT device (equally applicable to FinFET-based CTTs): (1) charge
trapping operation, (2) charge de-trapping operation.
Hardware Accelerator Design for Machine Learning 11
http://dx.doi.org/10.5772/intechopen.72845

were reported to be used as digital memory devices with reliable trapping and de-trapping
behavior. Different from other charge-trapping devices such as floating-gate transistors,
transistors with an organic gate dielectric, and carbon nanotube transistors, CTTs are
manufacturing ready and fully CMOS compatible in terms of process and operating. IT
shows that more than 90% of the trapped charge can be retained after 10 years even when
the device is baked at 85 C [15].
A schematic of the basic operation of a CTT device is depicted in Figure 11. The device
threshold voltage, VT, is modulated by the charge trapped in the gate dielectric of the transis-
tor. VT increases when positive pulses are applied to the gate to trap electrons in the high-k
layer and decreases when negative pulses are applied to the gate to de-trap electrons from the
high-k layer. CTT devices can be programmed by applying logic-compatible voltages.

A memristive computing engine based on the charge-trapping transistor (CTT). The proposed
memristive computing engine consists of 784 by 784 CTT analog multipliers and achieves 100
power and area reduction compared to the conventional digital approach. Through impleme-
nting a novel sequential analog fabric (SAF), the mixed-signal interfaces are simplified and it
only requires an 8-bit analog-to-digital converter (ADC) in the system. The top-level system
architecture is shown in Figure 12. A 784 by 784 CTT computing engine is implemented using
TSMC 28 nm CMOS technology and occupies 0.68mm2 as shown in Figure 13. It achieves 69.9
TOPS with 500 MHz clock frequency and consumes 14.8 mW.

Figure 12. Top-level system architecture of the proposed memristive computing engine, including CTT array, mixed-
signal interfaces including tunable low-dropout regulator (LDO), analog-to-digital converter (ADC), and novel sequential
analog fabrics (SAF).
12 Machine Learning - Advanced Techniques and Emerging Applications

Figure 13. Layout view in TSMC 28 nm CMOS technology.

Compared with the traditional digital processor, analog-based computing processor achieves
much less power as well as large area reduction in the design. Table 1 is a comparison of the
computation ability between the analog processor and digital processor. As it shows, analog
processor achieved more than 100 times computing speed with 1/10 times area consumption
compared to digital processor.
Even the analog computing shows advantages in the computation speed and design cost, a low
computing resolution limits its application in most ML algorithms. Due to the design challenges
of the ADC in the analog processor, the processor can only handle computation resolution that is
less or equal to around 10 bits, making it not suitable for most AI applications.

Merits Digital [16] This work

Process Standard 28 nm FD-SOI CMOS Standard 28 nm CMOS

2
Core Area (mm ) 5.8 0.68
Power (mW) 41 14.8
Clock Speed 200–1175 MHz 500 MHz
Peak MACs # 0.64 K 69.9 K
SRAM Size 128 KB 0
Non-Volatile No Yes

Table 1. Comparison table between analog computing and digital computing in Ref [14].
Hardware Accelerator Design for Machine Learning 13
http://dx.doi.org/10.5772/intechopen.72845

3. Conclusion

In this chapter, various computation hardware platforms for machine learning algorithms are
discussed. Among them, GPU is the most widely used one due to its fast computation speed
and compatibility with various algorithms. FPGA shows better energy efficiency compared
with GPU when computing machine learning algorithm at the cost of low speed. Finally,
different ASIC architectures are proposed to support certain kinds of the machine learning
algorithms such as a deep convolutional neural network with model compression technique to
improve hardware performance. Compared with the GPU and FPGA, ASIC shows the best
energy efficiency and computation speed, however, at the cost of reconfigurability to various
ML algorithms. Depending on the specific applications, the designers should select the most
suitable computation hardware platform.

Author details

Li Du* and Yuan Du

*Address all correspondence to: [email protected]
Hardware Architecture Research Engineer, Kneron Inc., Research Scientist, UCLA,
Los Angeles, USA

References

[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. May 2015;521:436-444

[2] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolution
neural networks. Proceeding of Advances in Neural Information Processing Systems.
2012;25:1097-1105

[3] Silver D et al. Mastering the game of go with deep neural networks and tree search. Nature.
Jan. 2016;529(7587):484-489

[4] Gawande NA, Landwehr JB, Daily JA, Tallent NR, Vishnu A, Kerbyson DJ. Scaling deep
learning workloads: NVIDIA DGX-1/Pascal and intel knights landing. In: IEEE Interna-
tional Parallel And Distributed Processing Symposium Workshops (IPDPSW); Lake Buena
Vista. 2017 pp. 399-408
[5] Putnam A. The configurable cloud – accelerating hyperscale datacenter services with FPGA.
In: IEEE 33rd International Conference on Data Engineering (ICDE); San Diego. 2017. p. 1587

[6] Chang AXM, Culurciello E. Hardware accelerators for recurrent neural networks on
FPGA. In: IEEE International Symposium on Circuits and Systems (ISCAS); MD, Balti-
more. 2017. pp. 1-4
14 Machine Learning - Advanced Techniques and Emerging Applications

[7] Sim J, Park J-S, Kim M, Bae D, Choi Y, Kim L-S. A 1.42TOPS/W deep convolution neural
network recognition processor for intelligent IoE systems. In: Proceeding of IEEE Inter-
national Solid-State Circuits Conference (ISSCC). Jan/Feb. 2016. pp. 264-265
[8] Bong K, Choi S, Kim C, Kang S, Kim Y, Yoo HJ. 14.6 A 0.62mW ultra-low-power
convolutional-neural-network face-recognition processor and a CIS integrated with
always-on haar-like face detector. IEEE International Solid-State Circuits Conference
(ISSCC); San Francisco. 2017. pp. 248–249
[9] Desoli G et al. 4.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm
for intelligent embedded systems. In: IEEE International Solid-State Circuits Conference
(ISSCC); San Francisco. 2017. pp. 238–239
[10] Chen YH, Krishna T, Emer J, Sze V. 14.5 Eyeriss: An energy-efficient reconfigurable
accelerator for deep convolution neural networks. In: IEEE International Solid-State Cir-
cuits Conference (ISSCC); San Francisco. 2016. pp. 262–263
[11] Du L et al. A reconfigurable streaming deep convolutional neural network accelerator for
internet of things. IEEE Transactions on Circuits and Systems I: Regular Papers. 2017; 99:1-11
[12] Han S, Mao H, Dally W. Deep compression: Compressing DNNs with pruning, trained
quantization and huffman coding. 2015. arxiv:1510.00149v3

[13] Du Y et al. A streaming accelerator for deep convolutional neural networks with image and
feature decomposition for resource-limited system applications. Sep. 2017. arXiv:1709.05116
[cs.AR]

[14] Du Y et al. A memristive neural network computing engine using CMOS-compatible

Charge-Trap-Transistor (CTT). Sep 2017. arXiv:1709.06614 [cs.ET]

[15] Shin S, Kim K, Kang SM. Memristive computing- multiplication and correlation. In: IEEE
International Symposium on Circuits and Systems; Seoul. 2012. pp. 1608-1611
[16] Desoli G et al. 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI
28nm for intelligent embedded systems. In: IEEE International Solid-State Circuits Con-
ference (ISSCC); San Francisco. 2017. pp. 238-239

A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
LC3 Specification v1.0
No ratings yet
LC3 Specification v1.0
221 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
No ratings yet
Deep Learning Methods and Applications For Electrical Power Systems A Comprehensive Review
22 pages
MonteCarlo Simulations Using ADE XL
No ratings yet
MonteCarlo Simulations Using ADE XL
7 pages
Clean Energy Council Installers Checklist PDF
No ratings yet
Clean Energy Council Installers Checklist PDF
3 pages
5 Introduction To Huawei AI Platforms v3.5
No ratings yet
5 Introduction To Huawei AI Platforms v3.5
113 pages
SOC - System On Chip Seminar
No ratings yet
SOC - System On Chip Seminar
8 pages
FPGA Lect3
No ratings yet
FPGA Lect3
93 pages
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
No ratings yet
IEEE 802.11ax - An: Osama Aboul-Magd Huawei Technologies, Canada
30 pages
Cadence TLM WP
No ratings yet
Cadence TLM WP
9 pages
Verilog Hardware Description Language
No ratings yet
Verilog Hardware Description Language
61 pages
Nvidia Corporation: Nvidia Is A Multinational Corporation Which Specializes in The Development of Graphics
No ratings yet
Nvidia Corporation: Nvidia Is A Multinational Corporation Which Specializes in The Development of Graphics
1 page
CDN Creating Analog Behavioral Models
No ratings yet
CDN Creating Analog Behavioral Models
24 pages
8 Bit ALU by Xilinx
No ratings yet
8 Bit ALU by Xilinx
16 pages
Designing of Alu Using Verilog HDL
No ratings yet
Designing of Alu Using Verilog HDL
2 pages
ChipDesign Summer2014
No ratings yet
ChipDesign Summer2014
36 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Applications of Machine Learning and Deep Learning in Antenna Design Optimization and Selection A Review
No ratings yet
Applications of Machine Learning and Deep Learning in Antenna Design Optimization and Selection A Review
26 pages
PCS White Paper
No ratings yet
PCS White Paper
14 pages
System On Chip SoC Report
100% (1)
System On Chip SoC Report
14 pages
Gpgpu Final
No ratings yet
Gpgpu Final
124 pages
Part 1 - Intro Data Viz & Power BI
No ratings yet
Part 1 - Intro Data Viz & Power BI
39 pages
SILICA Xilinx Zynq ZedBoard Vivado Workshop Ver1.0
No ratings yet
SILICA Xilinx Zynq ZedBoard Vivado Workshop Ver1.0
61 pages
18CS653 - NOTES Module 1
No ratings yet
18CS653 - NOTES Module 1
24 pages
ns-3: Open Source Simulator For Wireless Network Performance Evaluation
100% (1)
ns-3: Open Source Simulator For Wireless Network Performance Evaluation
86 pages
Installing Cadence IC6.1
No ratings yet
Installing Cadence IC6.1
24 pages
Data Link Layer Protocols
No ratings yet
Data Link Layer Protocols
148 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Ns 3 Workshop Part1
100% (1)
Ns 3 Workshop Part1
38 pages
S4421 Gpu Computing With Matlab
No ratings yet
S4421 Gpu Computing With Matlab
27 pages
Mixed Signal Test
No ratings yet
Mixed Signal Test
14 pages
Neural Network and Their Applications
No ratings yet
Neural Network and Their Applications
2 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
18 pages
A Practical Guide To Adopting The Universal Verification Methodology (Uvm) Second Edition
No ratings yet
A Practical Guide To Adopting The Universal Verification Methodology (Uvm) Second Edition
2 pages
A106-Wang 0
No ratings yet
A106-Wang 0
6 pages
Computer Arichitecture
No ratings yet
Computer Arichitecture
60 pages
Vlsi Design Using VHDL
No ratings yet
Vlsi Design Using VHDL
20 pages
CPUs GPUs Accelerators
No ratings yet
CPUs GPUs Accelerators
22 pages
Cs1111-Computer Networks and Distributed Systems
No ratings yet
Cs1111-Computer Networks and Distributed Systems
16 pages
What Set of Skills Does An Embedded System Engineer Need - Quora
No ratings yet
What Set of Skills Does An Embedded System Engineer Need - Quora
5 pages
SIGIR21 Wang Et Al Decoupled GNN
No ratings yet
SIGIR21 Wang Et Al Decoupled GNN
10 pages
A Basic Introduction To The GM ID-Based Design PDF
100% (4)
A Basic Introduction To The GM ID-Based Design PDF
15 pages
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
100% (1)
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
37 pages
An Introduction To Functional Verification of I2C Protocol Using UVM
No ratings yet
An Introduction To Functional Verification of I2C Protocol Using UVM
5 pages
Vlsi Imlememt of Odfm
No ratings yet
Vlsi Imlememt of Odfm
10 pages
Rsfec Project Report - Final
No ratings yet
Rsfec Project Report - Final
81 pages
Design of VLSI Architecture For A Flexible Testbed of Artificial Neural Network For Training and Testing On FPGA
No ratings yet
Design of VLSI Architecture For A Flexible Testbed of Artificial Neural Network For Training and Testing On FPGA
7 pages
RISC-V - Control Unit
100% (1)
RISC-V - Control Unit
25 pages
Mmwave - SDK - User - Guide 1.0.0
No ratings yet
Mmwave - SDK - User - Guide 1.0.0
64 pages
Unit-2 IoT
100% (1)
Unit-2 IoT
45 pages
ECN TLP Prefix 2008-12-15
100% (1)
ECN TLP Prefix 2008-12-15
19 pages
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
No ratings yet
A Comprehensive Survey On Applications of Transformers For Deep Learning Tasks
58 pages
Financial Time Series
No ratings yet
Financial Time Series
34 pages
Logic synthesis Standard Requirements
From Everand
Logic synthesis Standard Requirements
Gerardus Blokdyk
No ratings yet
Emerging Technologies in Information and Communications Technology
From Everand
Emerging Technologies in Information and Communications Technology
Fouad Sabry
No ratings yet
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
Parallel Data Mining Techniques On Graph
No ratings yet
Parallel Data Mining Techniques On Graph
26 pages
GOK LAnd Revenue Part 1
No ratings yet
GOK LAnd Revenue Part 1
10 pages
BDA UserManualBuildingPlanApproval
No ratings yet
BDA UserManualBuildingPlanApproval
18 pages
How To Make Your Own Deep Learning Accelerator Chip - by Manu Suryavansh - Towards Data Science
No ratings yet
How To Make Your Own Deep Learning Accelerator Chip - by Manu Suryavansh - Towards Data Science
18 pages
FIT Rate Aware EM Analysis: Govind Saraswat, William Au, Qing He and Subramanian Venkateswaran
No ratings yet
FIT Rate Aware EM Analysis: Govind Saraswat, William Au, Qing He and Subramanian Venkateswaran
5 pages
Brigade Cornerstone Utopia Eden Block-Floorplans
No ratings yet
Brigade Cornerstone Utopia Eden Block-Floorplans
5 pages
Hardware Accelerators For Machine Learning (CS 217) by cs217
No ratings yet
Hardware Accelerators For Machine Learning (CS 217) by cs217
8 pages
Applied Sciences: The Challenges of Advanced CMOS Process From 2D To 3D
No ratings yet
Applied Sciences: The Challenges of Advanced CMOS Process From 2D To 3D
32 pages
ARM System-On-chip Architecture 2nd Edition Book R
No ratings yet
ARM System-On-chip Architecture 2nd Edition Book R
2 pages
Advancing High Performance Heterogeneous Integration Through Die Stacking
No ratings yet
Advancing High Performance Heterogeneous Integration Through Die Stacking
8 pages
Welcome To Timothy Lake Lodge
No ratings yet
Welcome To Timothy Lake Lodge
3 pages
FSM IC Validator 2018
No ratings yet
FSM IC Validator 2018
36 pages
Overview of 3D Architecture Design Opportunities and Techniques
No ratings yet
Overview of 3D Architecture Design Opportunities and Techniques
6 pages
Treaty Countries
No ratings yet
Treaty Countries
6 pages
Tadamn
No ratings yet
Tadamn
1 page
Public Administration Material Part 1 Administrative Theory
No ratings yet
Public Administration Material Part 1 Administrative Theory
365 pages
Tilda Go Man No Man Gigo Hi Hello How Are U Go Maaaan Yes Man Soo Cya To Go
No ratings yet
Tilda Go Man No Man Gigo Hi Hello How Are U Go Maaaan Yes Man Soo Cya To Go
1 page
Tilda Go Man No Man Gigo Hi Hello How Are U Go Maaaan Yes Man
No ratings yet
Tilda Go Man No Man Gigo Hi Hello How Are U Go Maaaan Yes Man
1 page
Technical Volume
No ratings yet
Technical Volume
162 pages
Chapter Four-power Amplifiers
No ratings yet
Chapter Four-power Amplifiers
78 pages
MTC Tipb 131-23
No ratings yet
MTC Tipb 131-23
4 pages
Excavating A
No ratings yet
Excavating A
2 pages
Abstract For Flood Control System
0% (1)
Abstract For Flood Control System
3 pages
Shipping Label 383954189 1091269616711 PDF
No ratings yet
Shipping Label 383954189 1091269616711 PDF
3 pages
Aveva Licensing System
No ratings yet
Aveva Licensing System
66 pages
B A N G L A D e S H L A B o R L A W & L A B o R R U L e S 2 0 1 5
No ratings yet
B A N G L A D e S H L A B o R L A W & L A B o R R U L e S 2 0 1 5
3 pages
DPC Final
No ratings yet
DPC Final
18 pages
Kanishk Internship Diary
No ratings yet
Kanishk Internship Diary
38 pages
INCORTEMS-AND-SALES-CONTRACT-GROUP-ASSIGNMENT
No ratings yet
INCORTEMS-AND-SALES-CONTRACT-GROUP-ASSIGNMENT
3 pages
Uucms.karnataka.gov.in ExamGeneral StudentPrintExamApplication
No ratings yet
Uucms.karnataka.gov.in ExamGeneral StudentPrintExamApplication
1 page
Bridge Construction
No ratings yet
Bridge Construction
208 pages
Extension Education PDF
No ratings yet
Extension Education PDF
2 pages
Metal Proc. Mid Exam
No ratings yet
Metal Proc. Mid Exam
3 pages
Exercise For Chapter 1
No ratings yet
Exercise For Chapter 1
7 pages
Airstream 2008
No ratings yet
Airstream 2008
5 pages
00 360 Midas Experience Program Intro
No ratings yet
00 360 Midas Experience Program Intro
73 pages
Update To PTC Creo Parametric 3.0 From 2.01 PDF
No ratings yet
Update To PTC Creo Parametric 3.0 From 2.01 PDF
6 pages
Warehouse Receipt (WR) : Star Agriwarehousing and Collateral Management Limited
No ratings yet
Warehouse Receipt (WR) : Star Agriwarehousing and Collateral Management Limited
3 pages
MCQ
0% (1)
MCQ
87 pages
Manila Jockey Club Vs CA
No ratings yet
Manila Jockey Club Vs CA
2 pages
Student Honor Congratulatory Letter
No ratings yet
Student Honor Congratulatory Letter
6 pages
ComplianceForge Hierarchical Cybersecurity Governance Framework
No ratings yet
ComplianceForge Hierarchical Cybersecurity Governance Framework
5 pages
Related Case Study.
No ratings yet
Related Case Study.
3 pages
AWH Brochure Cleaning Technology
No ratings yet
AWH Brochure Cleaning Technology
2 pages
Converting Between Fractions, Decimals and Percentages - Knowledge Organiser
No ratings yet
Converting Between Fractions, Decimals and Percentages - Knowledge Organiser
2 pages
Carbon Nanotubes
100% (2)
Carbon Nanotubes
21 pages
India The Delhi Special Police Establishment Act 1946 PDF
No ratings yet
India The Delhi Special Police Establishment Act 1946 PDF
2 pages