Irmak2021energy_efficient

This paper presents an energy-efficient FPGA-based accelerator for Convolutional Neural Networks (CNNs), specifically implementing the LeNet architecture. The design achieves over 2x higher throughput compared to existing FPGA LeNet accelerators, processing up to 14 K images/sec while consuming less than 700 mW of power. The implementation is verified on a Nexys DDR 4 FPGA board, demonstrating significant resource optimization and energy efficiency without compromising performance.

Uploaded by

hathanh2124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Irmak2021energy_efficient

Uploaded by

hathanh2124

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

An Energy-Efficient FPGA-based Convolutional

Neural Network Implementation

Hasan Irmak and Nikolaos Alachiotis Daniel Ziener

Computer Architecture for Embedded Systems Computer Architecture and Embedded Systems
University of Twente Technische Universität Ilmenau
Enschede, The Netherlands Ilmenau, Germany
[email protected], [email protected] [email protected]

Abstract—Convolutional Neural Networks (CNNs) are a very design is verified on the hardware using Nexys DDR 4 FPGA
popular class of artificial neural networks. Current CNN mo- evaluation board.
2021 29th Signal Processing and Communications Applications Conference (SIU) | 978-1-6654-3649-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/SIU53274.2021.9477823

dels provide remarkable performance and accuracy in image

processing applications. However, their computational complexity The main contributions of this paper are the followings:
and memory requirements are discouraging for embedded real-
time applications. This paper proposes a highly optimized CNN
accelerator for FPGA platforms. The accelerator is designed as a - The design is fully optimized for pipelined operation
LeNet CNN architecture focusing on minimizing resource usage allowing higher throughput as compared to the previous
and power consumption. Moreover, the proposed accelerator works.
shows more than 2x higher throughput in comparison with other - The design is resource optimized by using minimum
FPGA LeNet accelerators with reaching up 14 K images/sec. The resources to fit the smallest package 7 series FPGAs.
proposed accelerator is implemented on the Nexys DDR 4 board - The design is very power efficient. Power consumption of
and the power consumption is less than 700 mW which is 3x lower the FPGA is less than 700 mW and much lower than the
than the current LeNet architectures. Therefore, the proposed current state-of-the-art methods.
solution offers higher energy efficiency without sacrificing the
throughput of the CNN.
The rest of the paper is organized as follows: Section II
Keywords—CNN, FPGA, Accelerator, LeNet
explains the basics of CNNs. Section III gives the details of the
FPGA architecture of the CNN accelerator. The experimental
I. INTRODUCTION results are presented in Section IV and the paper is concluded
with Section V.
Convolutional Neural Networks (CNNs) are multilayered
neural networks used especially in image processing II. BACKGROUND
applications such as image recognition, robot vision, and
autonomous driving vehicles [1], [2]. They have convolutional In the last decade, starting with the Alexnet in 2012,
layers for detecting the features, and feed-forward neural new CNN architectures give a promising performance in
network layers for classification. Although, implementation image classification applications. It was the first CNN to
of CNNs requires a high number of computations and win the annual olympics of computer vision, ILSVRC [7].
memory operations, very efficient CNN architectures can After AlexNet, more complex and more accurate CNNs are
be implemented on different hardware platforms using the developed for image classification purposes such as VGGNet
suitable hardware architectures and optimization techniques and Resnet [8], [9]. Designing more robust and accurate CNN
[3]. In recent years, FPGAs have been widely used in CNNs architectures is still a popular research area in the image
offering custom parallelization, low power, and low latency processing community. A typical CNN consists of input layer,
as compared to CPU and GPU platforms [4]. convolutional layers, fully connected layers, and output layer.
The first layer is called the input layer which is fed by
the image data. In the convolutional layers, two-dimensional
LeNet was the first CNN architecture and promoted the convolution is applied using a kernel to extract the features in
development of deep learning [5]. In this work, LeNet CNN is the image. After two dimensional convolution, an activation
implemented on an FPGA platform. It aims to design a low- function is used to generate a nonlinear output. Rectified
cost and energy-efficient CNN accelerator without degrading Linear Unit (ReLU), sigmoid and tanh are the most popular
the throughput. In the training, MNIST handwritten digit activation functions in CNNs [10]. Moreover, in order to
dataset is used. Before the FPGA implementation, a fixed decrease the performance sensitivity to the location of the
point model is designed and optimized in terms of accuracy features, downsampling can be applied to the output of the
and bit sizes using Python Tensorflow [6]. Then, the design convolutional layers. This operation is called pooling and
is developed in the Xilinx Vitis High Level Synthesis (HLS) two common pooling methods are average pooling and max
tool. Vitis HLS accelerates RTL design directly using C/C++ pooling. Average pooling calculates the average value for each
language and it is targeted for Xilinx FPGAs. Moreover, the patch on the feature map whereas max-pooling calculates the
978-1-6654-3649-6/21/$31.00 ©2021 IEEE maximum value for each patch of the feature map. After a few

Authorized licensed use limited to: UNIVERSITY OF TWENTE.. Downloaded on August 17,2021 at 12:49:20 UTC from IEEE Xplore. Restrictions apply.
convolutional layers, the input image becomes converted and accuracy drop as compared to floating-point accuracy. It is
downsampled feature maps. These features are used for the seen that for lower than 0.1 percentage drop in the accuracy
classification in the fully connected layers. Fully connected as compared to floating-point accuracy, using 8 bits weights,
layers are feed-forward neural networks consisting of one or 16 bits activations, and 32 bits biases is sufficient for the
more hidden layers. After the hidden layers, there is a final hardware design. The accuracies of fixed-point and floating-
output layer showing the class scores of each object to be point designs are both greater than 98%. Moreover, the number
classified. The general block diagram of a typical CNN is of layers and number of features are heuristically optimized to
shown in Figure 1. improve the accuracy. As a result, the optimized CNN consists
of 2 convolutional layers, 2 max-pooling layers, a hidden fully
connected layer, and an output layer. 3 and 12 feature maps
are used in the convolutional layers, respectively. Except for
the last layer, the ReLU activation function is used in the
convolutional and hidden layers and max-pooling is used in
the pooling layers. In convolutional layers, the convolutional
kernel is selected as 5 x 5 for its better performance as
compared to smaller size kernels. In the pooling layers, 2 x
Figure 1: A Typical Block Diagram of CNN with one input 2 kernels are used and the downsampling factor is selected as
layer, one convolutional layer, one pooling layer, two hidden two. After the convolutional layers, data is flattened and fed
layers, and one output layer to the fully connected layers. In the hidden layer, 48 neural
network nodes are used and in the output layer, there are
10 nodes showing the number of digits to be classified. The
In recent years, FPGA-based CNN accelerators have be- CNN is trained and tested using MNIST dataset. MNIST is
come a promising research area. Custom parallel processing a handwritten digit dataset that is commonly used for various
capabilities and higher performance per watt values make image processing systems [21].
FPGA more attractive in CNN implementations. Different
CNN architectures are implemented on FPGA platforms in the
literature [3]. In order to decrease the computational comple-
xity and memory requirements, binarized neural networks are
used in some studies [11], [12]. They reduce execution times
using bitwise operations, however, accuracy is generally less
than the fixed point models [13]. Some of the FPGA imple-
mentations focus on the optimization of the convolution engine
[14], [15]. These engines make the convolution operation in a Figure 2: Proposed CNN for FPGA Implementation with two
pipelined manner. There are also works using the Zynq series convolutional layers, two pooling layers, one fully connected
FPGAs, and these works process the data with the help of layer, and an output layer
embedded processor and programmable logic together in the
accelerator [16], [17]. Lenet, Alexnet and VGGNet are the
most popular CNNs used in the FPGA implementation. Ho- After optimizing the fixed-point model in Python, the CNN
wever, the power consumptions, in general, are compared with accelerator is developed on the FPGA platform using this
either processor, GPU, or PC implementations, which is not a fixed-point model. Convolutional, pooling, and fully connected
fair comparison [16], [18], [19]. Since FPGAs are inherently layers are coded according to the model. In Figure 2, each
energy-efficient devices, a fair comparison should be done blue box is designed separately in Vitis HLS. Vitis HLS tool
between FPGA implementations. In this work, Artix-7 FPGA transforms a C, C++, or SystemC code into a register transfer
family is selected intentionally because Artix-7 series FPGAs level (RTL) implementation to use in Xilinx FPGAs. Using the
are the cost-effective and energy-efficient FPGAs among the pragmas in the software code, different parallelization levels
Xilinx FPGA series [20]. Moreover, keeping the resource and different hardware can be generated. Vitis HLS methodo-
usage as low as possible without degrading the performance logy allows designers to develop and verify the designs faster
helps to fit all the CNN architecture in a very small package than the traditional hardware description languages.
FPGA (i.e.1 cm x 1 cm) with consuming only 628 mW. This In the CNN accelerator design, the key processing ope-
not only helps developing compact designs but also makes the ration is the convolution operation. It dominates the total
CNN accelerator cost-effective. processing time. Therefore, it needs to be carefully designed
in hardware. As seen in Figure 3, two-dimensional convolution
III. ACCELERATOR DESIGN is calculated for each pixel and generates an output pixel for
the feature map.
In this section, the proposed CNN accelerator is explained
in detail. In this work, a LeNet CNN architecture has been From Figure 3, it can be seen that for each pixel of the
developed, implemented, and verified on the FPGA platform. output feature map, 25 multiplications and 25 additions are
The developed LeNet CNN structure is given in Figure 2. required. In order to increase the throughput, for each pixel
The CNN input is a 32 x 32 grayscale image and the output convolution operation is done in one clock cycle. As a result,
is the classification result. The network is first developed 25 DSP slices are used in the convolution operation. DSP slices
using Python Tensorflow. It uses fixed-point data types in are the basic elements of the FPGA for arithmetic operations.
all the stages and the bit sizes are optimized based on the Basic DSP operations such as accumulator, multiplier, adder

Authorized licensed use limited to: UNIVERSITY OF TWENTE.. Downloaded on August 17,2021 at 12:49:20 UTC from IEEE Xplore. Restrictions apply.
IV. EVALUATION
The design is implemented and tested on a Digilent Nexys4
DDR FPGA board [22]. The board is equipped with a Xilinx
Artix XC7A100T FPGA. The overall design is running at 125
MHz. The overall resource usage of the whole design is given
in Table 3. Since the design has a very low resource usage,
it can fit the smallest package FPGAs of Xilinx 7 series such
Figure 3: Two-Dimensional Convolution of Input Image (left as XC7A50T FPGA (i.e., 1 cm x 1 cm in dimension) [23].
hand side) to Output Image (right hand side) Moreover, the power consumption of the whole design is 628
mW. This consumption is taken by Vivado’s power report of
the implemented design and consisting of 94 mW static and
can be implemented using these slices. Since the number of 534 mW dynamic power. This is nearly 67 % lower than the
DSPs are limited, these slices are used only for multiplication other LeNet CNN architectures which are around 1800 mW
operations, other mathematical operations are done in the [16] [17].
programmable logic of the FPGA device. In the second layer,
since 12 feature map exists, for each four feature map, one
convolution engine is used. So, totally of 75 DSP slices are TABLE III: R ESOURCE U SAGE OF P ROPOSED CNN ACCE -
used in the second convolutional layer. Finally, in the hidden LERATOR
layer and output layer, the fully connected layers are paralleli- BRAM DSP LUT FF
zed in order to decrease the processing time. For each feature Used Resources 29 120 15951 17664
map for the output of the second pooling layer, a parallel path Resources in Nexys DDR 4 Board 135 240 63400 126800
Utilization in Nexys DDR 4 Board (%) 21.48 50 25.16 13.93
is created. In other words, hidden layer multiplications are
performed using 12 DSP slices. The resource usage of each
layer is shown in Table 1. In addition to these parallelizations, In the experimental setup, the images are loaded using
for each layer, the internal memories storing the weights and the serial interface of the board and the result is shown
biases are concatenated to reach the data in one clock cycle to on the LEDs of the board. Meanwhile, the output of the
avoid memory bottlenecks. each layer in the FPGA CNN accelerator is verified bitwise
by matching the outputs of the Python design using the
Vivado hardware manager. In other words, the output of the
TABLE I: R ESOURCE U SAGE OF THE D IFFERENT L AYERS Python and FPGA designs give exactly the same result in
each stage of the CNN. Moreover, for a fair comparison, the
BRAM DSP LUT FF proposed accelerator is compared with the other LeNet CNN
Conv Layer 1 6 25 4670 4690 implementations in the literature having the same number of
Conv Layer 2 14 75 11436 12057
convolutional and fully connected layers [17] [24] [25]. The
Hidden Layer 7 12 371 332
Output Layer 2 8 297 209
design of [24] is using Zynq Ultrascale FPGA and HLS is used
in the development stage. In the design of [25], a ZCU102
board with a Xilinx FPGA chip ZU9EG is used and different
In the accelerator design, since optimized bit widths are accelerators are used for processing the CNN layers. Lastly,
used in weights and biases, these coefficients are fit to the in the study of [17], Digilent Arty Z7-20 development board,
internal memories of the FPGA. Using the internal memory based on the Xilinx Zynq-7000 System on Chip (SoC), is used.
of FPGA achieves high memory bandwidth and decreases the This design proposes an HW/SW co-processing accelerator. It
number of clock cycles required to finish the CNN operations. uses programmable logic as an accelerator, and the system
Every layer is optimized in terms of processing time as much is managed by the ARM processor. Performance comparison
as possible based on the available resources in the FPGA with these studies is given in Table 4. As clearly seen from
device. The number of clock cycles for processing of each Table 4, the proposed accelerator has lower resource usage
layer is shown in Table 2. As shown in Table 2, the total in DSP and BRAM, which are the most critical components
processing time for CNN operation is 70 us. In other words, of the FPGAs, and much lower processing time as compared
the proposed CNN accelerator can process 14K images/sec. to the other implementations. Using the pragmas efficiently
in the hardware design such as pipelining, loop unrolling,
and memory reshaping in the proposed design achieves much
TABLE II: P ROCESSING TIME OF THE D IFFERENT L AYERS higher throughput as compared to the other implementations.
Clock Cycle Processing Time Besides, using internal memories of the FPGA instead of
Conv Layer 1 + Max Pool Layer 1 3144 25.15 us external memory decreases the processing time much further.
Conv Layer 2 + Max Pool Layer 2 3599 28.8 us
Hidden Layer 1878 15.02 us
Output Layer 160 1.28 us
V. CONCLUSION
Total 8781 70.2 us
In this work, an FPGA-based accelerator for CNN archi-
tectures is implemented, particularly LeNet architecture. The
After designing each layer, the CNN accelerator is created fixed point design is using 8 bits for weights, 16 bits for
using Vivado by cascading these layers. The final design is activations, and 32 bits for biases. The accuracy is higher
placed and routed without any placement, routing, or timing than 98% and the difference between fixed-point and floating-
errors. point designs are less than 0.1 percent. Vitis HLS is used for

Authorized licensed use limited to: UNIVERSITY OF TWENTE.. Downloaded on August 17,2021 at 12:49:20 UTC from IEEE Xplore. Restrictions apply.
[15] D. Wu, Y. Zhang, X. Jia, L. Tian, T. Li, L. Sui, D. Xie, and Y. Shan,
TABLE IV: C OMPARISON OF D IFFERENT L E N ET CNN AC - “A high-performance cnn processor based on fpga for mobilenets,” in
CELERATORS 2019 29th International Conference on Field Programmable Logic and
Applications (FPL), pp. 136–143, 2019.
BRAM DSP LUT Processing Time
[16] D. Rongshi and T. Yongming, “Accelerator implementation of lenet-
Gonzalez [17] 44 153 4738 2268 us 5 convolution neural network based on fpga with hls,” in 2019 3rd
Cho [24] 95 143 32689 3500 us International Conference on Circuits, System and Simulation (ICCSS),
Shi [25] 54 204 25276 170 us pp. 64–67, IEEE, 2019.
This work 29 120 15951 70 us [17] E. González, W. D. Villamizar Luna, and C. A. Fajardo Ariza, “A har-
dware accelerator for the inference of a convolutional neural network,”
designing the layers and the whole CNN accelerator is finalized Ciencia E Ingenieria Neogranadina, vol. 30, no. 1, pp. 107–116, 2020.
in Vivado. The FPGA design is tested and verified on a Nexys [18] S. Li, Y. Luo, K. Sun, N. Yadav, and K. K. Choi, “A novel fpga
4 DDR evaluation board. The accelerator runs at 125 MHz and accelerator design for real-time and ultra-low power deep convolutional
overall throughput is 14K images/sec with consuming only neural networks compared with titan x gpu,” IEEE Access, vol. 8,
628 mW. Therefore, the proposed solution is 7x better than pp. 105455–105471, 2020.
current LeNet FPGA implementations in performance per watt [19] Y. Zhou and J. Jiang, “An fpga-based accelerator implementation
for deep convolutional neural networks,” in 2015 4th International
and it can be used in real-time embedded CNN applications Conference on Computer Science and Network Technology (ICCSNT),
effectively. vol. 1, pp. 829–832, IEEE, 2015.
[20] E. Mohsen, “Reducing system power and cost with artix-7 fpgas,”
Xilinx, Artix, vol. 7, pp. 1–12, 2012.
ACKNOWLEDGMENT
[21] L. Deng, “The mnist database of handwritten digit images for machine
This research was supported by The Scientific and Techno- learning research [best of the web],” IEEE Signal Processing Magazine,
logical Research Council of Turkey (TUBITAK). vol. 29, no. 6, pp. 141–142, 2012.
[22] “Nexys 4 ddr reference manual.” https://reference.digilentinc.com/
reference/programmable-logic/nexys-4-ddr/reference-manual. Acces-
R EFERENCES sed: 2021-03-10.
[23] P. Specification, “7 series fpgas packaging and pinout,” 2011.
[1] M.-j. Lee and Y.-g. Ha, “Autonomous driving control using end-to-end
deep learning,” in 2020 IEEE International Conference on Big Data [24] M. Cho and Y. Kim, “Implementation of data-optimized fpga-based
and Smart Computing (BigComp), pp. 470–473, IEEE, 2020. accelerator for convolutional neural network,” in 2020 International
Conference on Electronics, Information, and Communication (ICEIC),
[2] H. Sumida, F. Ren, S. Nishide, and X. Kang, “Environment recognition
pp. 1–2, IEEE, 2020.
using robot camera,” in 2020 5th IEEE International Conference on Big
Data Analytics (ICBDA), pp. 282–286, 2020. [25] Y. Shi, T. Gan, and S. Jiang, “Design of parallel acceleration method
of convolutional neural network based on fpga,” in 2020 IEEE 5th
[3] A. Shawahna, S. M. Sait, and A. El-Maleh, “Fpga-based accelerators of
International Conference on Cloud Computing and Big Data Analytics
deep learning networks for learning and classification: A review,” IEEE
(ICCCBDA), pp. 133–137, IEEE, 2020.
Access, vol. 7, pp. 7823–7859, 2019.
[4] “Comparing hardware for artificial intelligence: Fpgas vs.
gpus vs. asics.” http://lreese.dotsenkoweb.com/2019/03/30/
comparing-hardware-for-artificial-intelligence-fpgas-vs-gpus-vs-asics/
#. Accessed: 2021-03-25.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, 1998.
[6] “Introduction to tensorflow.” https://www.tensorflow.org/learn. Acces-
sed: 2021-03-10.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in neural informa-
tion processing systems, vol. 25, pp. 1097–1105, 2012.
[8] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778, 2016.
[10] Y. Wang, Y. Li, Y. Song, and X. Rong, “The influence of the activation
function in a convolution neural network model of facial expression
recognition,” Applied Sciences, vol. 10, no. 5, p. 1897, 2020.
[11] P. Wang, J. Song, Y. Peng, and G. Liu, “Binarized neural network
based on fpga to realize handwritten digit recognition,” in 2020 IEEE
International Conference on Information Technology,Big Data and
Artificial Intelligence (ICIBA), vol. 1, pp. 1204–1207, 2020.
[12] J. H. Kim, J. Lee, and J. H. Anderson, “Fpga architecture enhancements
for efficient bnn implementation,” in 2018 International Conference on
Field-Programmable Technology (FPT), pp. 214–221, 2018.
[13] T. Simons and D.-J. Lee, “A review of binarized neural networks,”
Electronics, vol. 8, no. 6, p. 661, 2019.
[14] Z. Liu, Y. Dou, J. Jiang, J. Xu, S. Li, Y. Zhou, and Y. Xu, “Throughput-
optimized fpga accelerator for deep convolutional neural networks,”
ACM Transactions on Reconfigurable Technology and Systems (TRETS),
vol. 10, no. 3, pp. 1–23, 2017.

Authorized licensed use limited to: UNIVERSITY OF TWENTE.. Downloaded on August 17,2021 at 12:49:20 UTC from IEEE Xplore. Restrictions apply.

Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
ChuWi Hi10 Pro Dual System Flash Tutorial
No ratings yet
ChuWi Hi10 Pro Dual System Flash Tutorial
12 pages
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
No ratings yet
FFCNN: Fast FPGA Based Acceleration For Convolution Neural Network Inference
5 pages
CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor For Forward and Backward Propagation of Convolutional Neural Networks
No ratings yet
CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor For Forward and Backward Propagation of Convolutional Neural Networks
8 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Data and Hardware Efficient Design For Convolutional Neural Network!
No ratings yet
Data and Hardware Efficient Design For Convolutional Neural Network!
10 pages
A Survey of FPGA Based Accelerators For
No ratings yet
A Survey of FPGA Based Accelerators For
32 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
[email protected]
No ratings yet
[email protected]
4 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
Kanoria Shubham Anil 2023HT01569
No ratings yet
Kanoria Shubham Anil 2023HT01569
9 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
A CNN Accelerator On FPGA With A Flexible Structure
No ratings yet
A CNN Accelerator On FPGA With A Flexible Structure
6 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
10.1109VDAT50263.2020.9190274
No ratings yet
10.1109VDAT50263.2020.9190274
6 pages
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
No ratings yet
Ane Cient Implementation of 2D Convolution in CNN: Jing Chang and Jin Sha
8 pages
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
No ratings yet
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
5 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Cafpga: An Automatic Generation Model For CNN Accelerator
No ratings yet
Cafpga: An Automatic Generation Model For CNN Accelerator
30 pages
10 3390@electronics8030295
No ratings yet
10 3390@electronics8030295
15 pages
Electronics 08 00065
No ratings yet
Electronics 08 00065
19 pages
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
No ratings yet
A CNN Accelerator On FPGA Using Depthwise Separable Convolution
5 pages
NullHop_A_Flexible_Convolutional_Neural_Network_Accelerator_Based_on_Sparse_Representations_of_Feature_Maps
No ratings yet
NullHop_A_Flexible_Convolutional_Neural_Network_Accelerator_Based_on_Sparse_Representations_of_Feature_Maps
13 pages
A Reconfigurable CNN-Based Accelerator Design For Fast and
No ratings yet
A Reconfigurable CNN-Based Accelerator Design For Fast and
20 pages
Main
No ratings yet
Main
8 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
Efficient CNN Accelerator on FPGA
No ratings yet
Efficient CNN Accelerator on FPGA
9 pages
rongshi2019
No ratings yet
rongshi2019
4 pages
Fully Convolutional
No ratings yet
Fully Convolutional
4 pages
1 s2.0 S1877050922005701 Main
No ratings yet
1 s2.0 S1877050922005701 Main
6 pages
10.1109 fpl53798.2021.00061
No ratings yet
10.1109 fpl53798.2021.00061
6 pages
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
No ratings yet
Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine
8 pages
Optimizing FPGA-based Accelerator Design For Deep Convolutional Neural Networks
No ratings yet
Optimizing FPGA-based Accelerator Design For Deep Convolutional Neural Networks
10 pages
2017.01.jssc.eyeriss_design
No ratings yet
2017.01.jssc.eyeriss_design
12 pages
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
No ratings yet
Zynqnet: An Fpga-Accelerated Embedded Convolutional Neural Network
102 pages
10190052
No ratings yet
10190052
11 pages
Research On FPGA Based Convolutional Neural Network Acceleration Method
No ratings yet
Research On FPGA Based Convolutional Neural Network Acceleration Method
4 pages
A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration
No ratings yet
A Mixed-Pruning Based Framework for Embedded Convolutional Neural Network Acceleration
10 pages
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
No ratings yet
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
5 pages
Guddu jha_organized
No ratings yet
Guddu jha_organized
3 pages
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
No ratings yet
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
8 pages
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
No ratings yet
Image Skin Cancer Classification Based On FPGA and Convolutional Neural Network
7 pages
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
No ratings yet
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
5 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
Advantages and Limitations of Fully on-chip CNN
No ratings yet
Advantages and Limitations of Fully on-chip CNN
5 pages
RM Merged Files
No ratings yet
RM Merged Files
207 pages
sensors-23-02045
No ratings yet
sensors-23-02045
16 pages
Towards Reconfigurable CNN Accelerator For FPGA Implementation
No ratings yet
Towards Reconfigurable CNN Accelerator For FPGA Implementation
5 pages
Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
No ratings yet
Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
8 pages
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
31 pages
FPT2017-PipeCNN
No ratings yet
FPT2017-PipeCNN
4 pages
An In-Memory VLSI Architecture For Convolutional Neural Networks
No ratings yet
An In-Memory VLSI Architecture For Convolutional Neural Networks
12 pages
Pynq Classification
No ratings yet
Pynq Classification
65 pages
High-Performance Acceleration of 2-D and 3-D CNNs On FPGAs Using Static Block Floating Point
No ratings yet
High-Performance Acceleration of 2-D and 3-D CNNs On FPGAs Using Static Block Floating Point
15 pages
Report23 24
No ratings yet
Report23 24
55 pages
Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow
No ratings yet
Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow
12 pages
Snowflake An Efficient Hardware Accelerator For Convolutional Neural Networks
No ratings yet
Snowflake An Efficient Hardware Accelerator For Convolutional Neural Networks
4 pages
Os PPT Galvin Chapter1
No ratings yet
Os PPT Galvin Chapter1
29 pages
VIA CN700 MBD-J-J7F2WE2G-manual PDF
No ratings yet
VIA CN700 MBD-J-J7F2WE2G-manual PDF
49 pages
Arduino Micro Controller Processing For Everyone
No ratings yet
Arduino Micro Controller Processing For Everyone
114 pages
DVR Dahua Dvr1604hf L en
No ratings yet
DVR Dahua Dvr1604hf L en
2 pages
IRC5 With FlexPendant - Operating Manual - ABB Robotics
100% (5)
IRC5 With FlexPendant - Operating Manual - ABB Robotics
384 pages
Computer Literacy: Hardware & Software Classification
100% (1)
Computer Literacy: Hardware & Software Classification
11 pages
Mixer Live Introduction
No ratings yet
Mixer Live Introduction
19 pages
(External) DeePMD Profiling Result
No ratings yet
(External) DeePMD Profiling Result
4 pages
Northbridge (Computing) - Wikipedia
No ratings yet
Northbridge (Computing) - Wikipedia
4 pages
Complete A+ Guide to IT Hardware and Software: CompTIA A+ Exams 220-1101 & 220-1102, 9th Edition Cheryl A. Schmidt & Christopher Leedownload
100% (2)
Complete A+ Guide to IT Hardware and Software: CompTIA A+ Exams 220-1101 & 220-1102, 9th Edition Cheryl A. Schmidt & Christopher Leedownload
49 pages
VGA Driver
No ratings yet
VGA Driver
2 pages
Unit 1 CH 1 Hardware Concepts
No ratings yet
Unit 1 CH 1 Hardware Concepts
87 pages
Manual P1100 V1.00 2018090501
No ratings yet
Manual P1100 V1.00 2018090501
78 pages
PD005 InstallGuide Vsphere 8.0.x
No ratings yet
PD005 InstallGuide Vsphere 8.0.x
109 pages
2 Product Specifications: 2-1 Fashion Feature
No ratings yet
2 Product Specifications: 2-1 Fashion Feature
4 pages
1550 Design
No ratings yet
1550 Design
31 pages
Objective Computer Fundamental Bca First
No ratings yet
Objective Computer Fundamental Bca First
11 pages
Chapter 2 - Registers and Digital Input and Output
No ratings yet
Chapter 2 - Registers and Digital Input and Output
27 pages
MIPS Architecture - Wikipedia
No ratings yet
MIPS Architecture - Wikipedia
114 pages
Instruction Manual VA 5xx Modbus RTU Slave Installation
No ratings yet
Instruction Manual VA 5xx Modbus RTU Slave Installation
28 pages
Introduction To Computers: Subject Name: Computer Skill-I Subject Code: CSK-I
No ratings yet
Introduction To Computers: Subject Name: Computer Skill-I Subject Code: CSK-I
46 pages
H-Drive2 OperationManual v0
No ratings yet
H-Drive2 OperationManual v0
39 pages
It_(r22)_3-2_embedded Systems Design Digital Notes
No ratings yet
It_(r22)_3-2_embedded Systems Design Digital Notes
116 pages
What Were Microsoft's Motives in Entering The Games Console Market With Xbox?
No ratings yet
What Were Microsoft's Motives in Entering The Games Console Market With Xbox?
2 pages
Manual Sensor Sense Guar Allen Bradley
No ratings yet
Manual Sensor Sense Guar Allen Bradley
8 pages
Samsung Smart Signage: QMH Series
No ratings yet
Samsung Smart Signage: QMH Series
4 pages
Datasheet MC908JL3ECPE PDF
No ratings yet
Datasheet MC908JL3ECPE PDF
180 pages
BeOS Bible - Emulation 3
No ratings yet
BeOS Bible - Emulation 3
8 pages
Manual
No ratings yet
Manual
12 pages

Irmak2021energy_efficient

Uploaded by

Irmak2021energy_efficient

Uploaded by

An Energy-Efficient FPGA-based Convolutional

Neural Network Implementation

Hasan Irmak and Nikolaos Alachiotis Daniel Ziener

dels provide remarkable performance and accuracy in image

You might also like