Systematic Analysis of FPGA-based Hardware Acceler
Systematic Analysis of FPGA-based Hardware Acceler
DOI: 10.54254/2755-2721/53/20241258
Fangrong Zhang
School of Electrical Engineering and Information, Southwest Petroleum University,
Chengdu, 610500, China
Abstract. In the modern era, machine learning stands as a pivotal component of artificial
intelligence, exerting a profound impact on various domains. This article delineates a
methodology for designing and applying Field Programmable Gate Array (FPGA) based
hardware accelerators for convolutional neural networks (CNNs). Initially, this paper introduces
CNNs, a subset of deep learning techniques, and underscore their pivotal role in artificial
intelligence, spanning domains such as image recognition, speech processing, and natural
language understanding. Subsequently, we delve into the intricacies of FPGA, an adaptable logic
device characterized by high integration and versatility, elucidating our approach to creating a
hardware accelerator tailored for CNNs on the FPGA platform. To enhance computational
efficiency, we employ technical strategies like dual cache structures, loop unrolling, and loop
tiling for accelerating the convolutional layers. Finally, through empirical experiments
employing YOLOv2, and validate the efficacy and superiority of our designed hardware
accelerator model. This paper anticipates that in the forthcoming years, the methodology and
research into FPGA-based CNN hardware accelerators will yield even more substantial
contributions, propelling the advancement and widespread adoption of deep learning technology.
1. Introduction
In this age of data inundation, deep learning, a significant branch of artificial intelligence, has produced
remarkable outcomes in a variety of areas, such as image recognition, speech processing, natural
language processing, and more. The past few years have seen a great deal of attention drawn to the
convolutional neural network (CNN) from both industry and academia, due to its remarkable
accomplishments in many domains, including computer vision and natural language processing [1]. This
network is one of the most influential in the realm of deep learning. The Field Programmable Gate Array
(FPGA) has been widely studied and utilized by researchers and engineers due to its versatility, high
degree of dependability, rapid performance, and capacity for reconfiguration.
This paper aims to review the design methods and applications of FPGA-based hardware accelerators
for convolutional neural networks. Introducing the fundamentals and evolution of convolutional neural
networks, as well as the features and uses of FPGA, this paper shall proceed. At the same time, it will
also focus on a FPGA-based convolutional neural network hardware accelerator design method, through
the dual cache structure, loop unrolling and loop tiling strategy and other technical means to achieve the
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
64
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
hardware acceleration of the convolutional layer and improve the computing performance. Finally, the
impact of different quantization precision on hardware resource consumption is analysed, and the
process of hardware design and verification is explored. It is expected that this review will provide an
outlook on the future of FPGA hardware accelerator design methods for convolutional neural networks,
and will be helpful and inspirational to related disciplines.
65
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
66
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
3. Technical analysis
67
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
68
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
The input and output ports of convolution calculation unit include three parts: feature map input port,
weight parameter input port and feature map output port. Because the number of bias parameters of
convolution calculation is small, they are stored in the FPGA chip in advance. During the convolution
calculation, the input and output ports use ping-pong operation, which makes the data reading and
writing be carried out at the same time under the same clock, which can effectively reduce the
transmission delay of reading and writing data and improve the calculation speed of each PE unit,
thereby improving the calculation performance of the whole hardware accelerator.
69
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
As can be seen from the table 1, using 16-bit fixed-point numbers saves more than half of the
hardware resources compared to using 32-bit floating-point numbers. At the same time, the actual test
finds that the accuracy of the 16-bit fixed-point quantization YOLOv2 final model has almost no change.
In this design, when verifying the correctness of the results of YOLOv2 network accelerator, 100
images of transportation vehicles are selected from the test set of COCO dataset for verification. Images
of vehicles mainly include people, bicycles, cars, motorcycles, airplanes, buses, trains, trucks, traffic
lights and traffic The test results show the time statistics of reading the prediction results of the input
images, including the total time, the average prediction time of 0.035338 seconds and 1.024473 seconds
for each image respectively, and the input images of cars and motorcycles. Finally, denotes the
proportion in the accuracy test where the motorcycle accuracy is 81% and the car accuracy is 69%.
70
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
4. Conclusion
This paper reviews the design methods and application research progress of FPGA-based hardware
accelerators for convolutional neural networks. At the outset, a convolutional neural network’s
fundamental definition and growth procedure is presented, as well as the features and uses of FPGA. A
detailed description is given of the FPGA-based hardware accelerator design method for convolutional
neural networks, including the specific implementation process and advantages of technical means such
as dual cache structure, loop unrolling and loop tiling strategy. At the same time, the impact of different
quantization precision on hardware resource consumption is analyzed, and the specific steps and skills
of hardware design and verification are discussed.
Although FPGA-based design methods for convolutional neural network hardware accelerators have
achieved remarkable results in improving computational performance and storage efficiency, there are
still some challenges. These include how to further improve the performance of hardware accelerators
to meet the needs of larger scale and more complex neural network models; And how to optimize the
logical resource allocation and utilization of FPGA to reduce power consumption and cost.
Looking into the future, with the continuous development and innovation of FPGA technology, the
design method of FPGA-based convolutional neural network hardware accelerator is expected to
welcome more breakthroughs and progress. For example, the introduction of new FPGA architectures
will provide higher computing power and storage capacity; The integration of heterogeneous computing
environments will achieve more efficient data transmission and computing cooperation. The
development of adaptive computing and parallel computing algorithms will further improve the
flexibility and intelligence of hardware accelerators. Therefore, it is expected to achieve more important
results in the design method and application research of FPGA-based convolutional neural network
hardware accelerators in the future, and make greater contributions to the development and application
of deep learning technology.
References
[1] Li Z, Liu F, Yang W, et al. A survey of convolutional neural networks: analysis, applications, and
prospects. IEEE transactions on neural networks and learning systems, 2021.
[2] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press, 2016.
[3] Gu J, Wang Z, Kuen J, et al. Recent advances in convolutional neural networks. Pattern
recognition, 2018, 77: 354-377.
[4] Farabet C, Couprie C, Najman L, et al. Learning hierarchical features for scene labeling. IEEE
transactions on pattern analysis and machine intelligence, 2012, 35(8): 1915-1929.
[5] Bao Jun, Dong Yachao, Liu Hongzhe. A Survey on the Development of Convolutional Neural
Networks. Network Application Branch of China Computer Users Association. China
computer users association network application branch 2020 24th session of new technology
of network and application essays, 2020: 16-21.
[6] Wu Yanxia, Liang Kai, Liu Ying et al. Progress and trend of FPGA Accelerators for Deep
Learning . Chinese Journal of Computers, 2019, 42(11): 2461-2480.
[7] Zhao Min. Research on Controller Based on FPGA. Northwestern Polytechnical University, 2004.
[8] Wang C, Luo Z. A Review of the Optimal Design of Neural Networks Based on FPGA. Applied
Sciences, 2022, 12(21): 10771.
71
Proceedings of the 4th International Conference on Signal Processing and Machine Learning
DOI: 10.54254/2755-2721/53/20241258
[9] Parnell K, Bryner R. Comparing and contrasting FPGA and microprocessor system design and
development. WP213, 2004, 1(1): 1-32.
[10] Mencer O, Allison D, Blatt E, et al. The History, Status, and Future of FPGAs: Hitting a nerve
with field-programmable gate arrays. Queue, 2020, 18(3): 71-82.
[11] Huang Peiyu, Zhao Qiang, Li Yulong. Design of Hardware Accelerator for Convolutional Neural
Network based on FPGA. Computer Application & Software, 2023, 40(03): 38-44.
72