0% found this document useful (0 votes)

3 views6 pages

Eai 12-5-2020 164497

This document presents a case study on an automatic FPGA-based hardware accelerator design framework specifically for image processing applications, demonstrating significant performance and energy optimizations. The framework allows designers to implement FPGA platforms without extensive hardware knowledge, achieving speed-ups of up to 3.15 times compared to traditional systems while reducing energy consumption by up to 66.5%. The study includes experiments on Canny edge detection and jpeg conversion applications, showcasing the framework's effectiveness in both embedded and high-performance computing environments.

Uploaded by

forpersonaluse8050

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views6 pages

Eai 12-5-2020 164497

Uploaded by

forpersonaluse8050

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

EAI Endorsed Transactions

on Context-aware Systems and Applications Research Article

Automatic FPGA-based Hardware Accelerator Design: A

Case Study with Image Processing Applications
Cuong Pham-Quoc1,2*

1
Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet, District 10, Ho Chi Minh City, Vietnam
2
Vietnam National University – Ho Chi Minh City, Thu Duc District, Ho Chi Minh City, Vietnam

Abstract

We present a case study of automatic FPGA-based hardware accelerator design using our proposed framework with the
image processing domain. With the framework, the ultimate systems are optimized in both performance and energy
consumption. Moreover, using the framework, designers can implement FPGA platforms without manually describing any
hardware cores or the interconnect. The systems offer accelerations in execution time compared to traditional general
purpose processors and accelerator systems designed manually. We use two applications in the image processing domain
as experiments to report our work. Those are Canny edge detection and jpeg converter. The experiments are conducted in
both embedded and high-performance computing platforms. Results show that we achieve overall speed-ups by up to
3.15´ and 2.87´ when compared to baseline systems in embedded and high-performance platforms, respectively. Our
systems consume less energy than other FPGA-based systems by up to 66.5%.

Keywords: FPGA-based design framework, Hardware accelerator, Image processing

Received on 10 April 2020, accepted on 10 May 2020, published on 12 May 2020

Copyright © 2020 Cuong Pham-Quoc licensed to EAI. This is an open access article distributed under the terms of the Creative
Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and
reproduction in any medium so long as the original work is properly cited.

doi: 10.4108/eai.12-5-2020.164497

* Corresponding author: [email protected]

FPGA-based hardware accelerator systems with an
optimized hybrid interconnect to reduce data
1. Introduction communication overhead. When using the framework,
designers are helped to build FPGA accelerator systems
In reccent years, FPGAs (Field programmable gate arrays) without much hardware knowledge and skills. The proposed
have been considering as a promising approach to overcome framework already solved a number of issues from which
many obstacles in SoC designs such as time-to-market, other similar toolchains suffer. For example, they could be
power consumption as well as flexibility. However, FPGA used for only a dedicated application domain [2][3][4][5][6]
designs still suffer from higher NRE (non-recurring or used for different applications without an interconnect
engineering) cost than general pupose procesosrs or require optimized [7][8]. One of the most important contribution of
knowledge of both hardware and software. These issues our framework is to optimize the interconnect of hardware
usally prevent the use of FPGA in real application domains cores because data communication is one of the two main
such as image processing, voice recognization, artificial sources of overhead in multicore systems [9]. Each
intelligent, or machine learning. Although a number of interconnection type, such as crossbar, bus or network-on-
toochains have been proposed to pursuade designers to chip, offers different advantages while suffering various
exploit FPGAs advantages like high-level synthesis tools, disadvantages [10]. Therefore, a hybrid interconnect
designers still need hardware knowledge and skills. consisting of multiple interconnect architectures is an
Our previous work presented in [1] proposed an appropriate approach for keeping performance of an FPGA-
automatic framework for designing and implementing based hardware accelerator improved further.

EAI Endorsed Transactions on

1 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5
Cuong Pham-Quoc

In this paper, we report a case study for the image In this framework, a target application (developed in high
processing domain. We use the design framework to level programming language) is profiled in Step 1 to collect
implement two different applications, the Canny edge execution time of functions in the application. This profiling
detection and the jpeg image converter. We discuss design step also creates a communication graph to represent data
steps and conduct experiments with the systems in both communication of functions inside the application. Based on
embedded and high-performance computing platforms. We this data communication graph, a hybrid interconnect for
analyze experimental results of both the systems in terms of hardware kernels as well as between the hardware device
execution time and energy consumption. We then compare and the host processor can be defined appropriately.
our system with general purpose processors and traditional The application is then partitioned into hardware and
FPGA-based accelerators. software parts in Step 2. Computationally intensive
The main contributions of this paper are summarized into functions should be candidates for accelerating with
two folds. hardware kernels in FPGA. Moreover, based on the data
communication graph, non-intensive functions may also be
• We first briefly introduce a case study of the image implemented in FPGA for reducing off-chip data
processing domain with two applications when using communication.
our proposed framework; As stated above, the data communication graph is used to
• We analyze and compare results of the systems define the most suitable hybrid interconnect for hardware
designed by the framework and other systems. kernels in Step 3. The hybrid interconnect may comprise
bus, crossbar, network-on-chip, or shared buffer depended
The rest of the paper is organized as follows. Section 2 on the communication patterns of the kernels. The main
quickly presents the framework designed and reported in our purpose of this step is to reduce data communication
previous work. Section 3 discusses steps in developing two overhead while keeping hardware resources usage for the
applications in the image processing domain with our interconnect minimized. That, in turn, will save energy
framework. We present our experimental results with two consumption of the system since the less resources are used,
computing platforms in Section 4. Finally, the paper the more energy is saved.
contributions are concluded in Section 5. The selected functions are then synthesized by high-level
synthesis (HLS) toolchains (in this case we use the Vivado
HLS from Xilinx) to create hardware kernels described by
2. The automatic design framework hardware description languages (in our work, we prefer
Verilog-HDL). With the support of HLS tools, functions
In our previous work [1], we already proposed an automatic written in high level programming languages can be
framework for designing an FPGA-based hardware compiled to Verilog-HDL automatically. This solves one of
accelerator with a hybrid interconnect. The framework the most difficult issues that designers usually have.
allows designers to develop a hardware accelerator system Finally, the entire system is developed, synthesized and
for a particular application without much knowledge and mapped to hardware platforms by tools provided by FPGA
working effort at the hardware level. Figure 1 depicts the manufacturers (in our case, we use Xilinx Vivado since
design flow of the proposed framework. Although the Xilinx platforms are targeted). Based on the resources
design flow includes five automatic processing steps, availability of the target platforms, computationally
designers are also able to interfere to manually make further intensive kernels can be replicated to further improve overall
improvements. performance.
As summarized above, designers do not need to interfere
with the framework much because all steps can process
automatically. However, in case designers would involve
themselves to modify some kernels or the interconnect, they
are still able to do.

3. Case study
In this section, we introduce our hardware accelerator
systems developed for the two applications, Canny edge
detection [11] and jpeg image converter [12] using the
framework.

3.1. The Canny edge detection

The main purpose of this application is to detect edges of
images by applying a number of operators like Gaussian
Figure 1. The proposed framework proposed in [1]

EAI Endorsed Transactions on

2 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5
Automatic FPGA-based Hardware Accelerator Design: A Case Study with Image Processing Applications

filter, gradient calculation, non-maximize suppression, and modules for our hardware kernels. Figure 3 shows a part of
finally hysteresis thresholding. The application was first the Verilog-HDL description generated by Xilinx Vivado
introduced in 1986 and coded by ANSI C. The profiling HLS for the gaussian_smooth function.
results indicate that three operators Gaussian filter, gradient
calculation and non-maximize suppression are the most
computationally intensive. The framework also generates
the data communication graph shown in Figure 2.

input_functions

1287880 Bytes
13360 UnMAs
13360 UnDVs

8 Bytes 689420 Bytes

8 UnMAs gaussian_smooth 53316 UnMAs Figure 3. Part of the gaussian kernel generated by
8 UnDVs 163444 UnDVs
Vivado HLS
8 Bytes 106400 Bytes
8 UnMAs 26600 UnMAs
8 UnDVs 26600UnDVs

derrivative_x_y
1120 Bytes
60 UnMAs
8 Bytes
8 UnMAs
In this work, we implement applications on both
64UnDVs 8 UnDVs embedded system with the Xilinx ML510 board [13] and
53200 Bytes
53200 UnMAs
high-performance computing system with Micron HC-2ex
53200 UnDVs [14]. Therefore, the final step needs to be performed for two
248 Bytes 65266 Bytes different target platforms. For the embedded platform,
magnitude_x_y 64 UnMAs 53200 UnMAs
860 UnDVs 53200 UnDVs because there exists only one FPGA device, we are able to
128380 Bytes build the system with 5 kernels. Hence, only the
26600 UnMAs
26600 UnDVs gaussian_smooth kernel is duplicated. Meanwhile, the high-
35282 Bytes 283937 Bytes
performance computing platform consists of 4 modern
26600 UnMAs
26600 UnDVs
non_max_supp 607 UnMAs
61130 UnDVs
FPGA devices that can host more kernels. Consequently, we
13348 Bytes
replicate the kernels up to 64 accelerators in total.
13300 UnMAs
13300 UnDVs
The system finally is synthesized and mapped to FPGA
devices by the Xilinx Vivado toolchain. This final step is
output_functions technology dependent. Resources usage and other
Figure 2. Data communication graph of the Canny parameters are reported in detail in Section 4.
edge detection application
3.2. The jpeg image converter
According to the profiling results, three aforementioned The jpeg image converter is the second application used as
operators (implemented as functions gaussian_smooth, our case study. The main purpose of this application is to
magnitude_x_y, non_max_supp) are good candidates to be encode bitmap images to the jpeg format. The application
accelerated by hardware kernels. However, as illustrated in was implemented in ANSI C and reported in the benchmark
the data communication graph, the derivative_x_y function in [12]. The main part of the application includes four
should also be accelerated by hardware kernels because computationally intensive functions. Those are huff_dc_dec,
there is a huge amount of data transferred between this huff_ac_dec, dquantz_lum, and j_rev_dct.
function and other candidates. Therefore, all the four
functions are implemented as accelerators. Other procedures
of the application are kept processing on the general-purpose
processor.
During the hybrid interconnect generation step, the most
suitable interconnect is designed for data communication
among the four hardware kernels above. Please note that, the
communication infrastructure for transferring data between
the processor and kernels is already defined by the target
platform used for building the system. In this application, a
shared buffer is used for the interconnect of the
gaussian_smooth and derivative_x_y accelerators while a
network-on-chip (NoC) is used for transferring data among
the derivative_x_y and the rest two kernels. Figure 4. Data communication graph of the jpeg
As discussed above, in this work we target Xilinx image converter application
platforms for implementing our systems; we therefore use
Xilinx Vivado HLS to generate Verilog-HDL description

EAI Endorsed Transactions on

3 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5
Cuong Pham-Quoc

Figure 4 illustrates the data communication graph mainly first are executed on the host processor. They then are
focusing on the four mentioned functions. All those processed by the entire accelerator systems, i.e., hardware
functions are accelerated by hardware kernels. Similar to the kernels process computationally intensive functions while
Canny edge detection application, a NoC should be used for software part still is kept executing on the host.
data communication of the three first kernels while a shared For the high-performance computing system, Micron
local buffer involves in transferring data between the HC-2ex (formerly Convey) is used as our experiment
dquantz_lum and j_rev_dct accelerators. Similar to the platform. The system includes of four Virtex-6 xc6vlx760
systems for the Canny edge detection application, Vivado FPGA devices and one Intel Xeon X5670 processor. While
HLS is also used for generating Verilog-HDL descriptions the host processor can function at 2.93 GHz, the accelerators
for these functions from the C code. are set to work at 200 MHz. Similar to the embedded
This application is also implemented in both the above system, we first execute the applications on the host
embedded and high-performance computing systems. With processor with full parallelization, i.e., the whole 12 cores of
the embedded system, the huff_ac_dec kernel is duplicated the processor are used to process the applications. They then
only while there are 64 kernels built in the high-performance are processed by the entire accelerator systems to compare
computing system. Figure 5 illustrates the architecture of the execution time.
jpeg application when implemented on the embedded
platform.
4.2 Experimental results
In this section, we discuss our experimental results with the
two aforementioned systems when processing the two
applications. Performance of kernels and entire systems as
well as energy consumption and resources usage are
analyzed.

Performance analysis
Table 1 depicts the kernels and the overall accelerator
systems speed-ups when compared to their host processors
(PowerPC for the embedded system and Intel Xeon for the
high-performance computing system) in the third and the
fourth columns, respectively. The table also presents speed-
ups compared to baseline systems (the baseline systems are
Figure 5. The architecture for jpeg application in the the traditional accelerator systems without helps of our
embedded platform framework for optimizing data communication but including
replicated kernels for a fair comparison) in the fifth and the
sixth columns, respectively.
4. Experiments
In this section, we present our experiments to verify the Table 1. Speed-ups comparison between our systems
FPGA-based accelerator systems reported above. Kernels and others
performance as well as overall system performance are
presented. Energy consumption compared to baseline
w.r.t host w.r.t baseline
systems is shown also. Platform App. processors systems
kernels overall kernels overall
4.1 Experimental setup Canny 3.88´ 3.15´ 2.12´ 1.83´
EMB(1)
jpeg 2.55´ 2.33´ 3.08´ 2.87´
As presented above, the embedded and high-performance Canny 2.62´ 2.61´ 1.55´ 1.54´
HPC(2)
computing systems used as our target platforms are ML510 jpeg 1.96´ 1.45´ 1.93´ 1.42´
and Micro HC-2ex, respectively. The ML510 board consists (1) EMB: Embedded system; (2) High-performance
of only one Xilinx xc5vfx130t FPGA device. Inside the computing system
FPGA device, there exist two embedded hardwired As shown in the table, compared to both the host
PowerPC processors that are used as host processors to processors and the baseline systems, our systems designed
process the software part of applications. Hardware with framework achieve better performance in term of
accelerators are mapped into the reconfigurable area of the execution time; in other words our systems outperform the
device. In this paper, we configure the processors others in term of execution time. Our systems process the
functioning at 400 MHz while hardware kernels can work at applications up to 3.15´ faster than general-purpose
100 MHz only due to huge amount of reconfigurable logic processors and up to 2.87´ faster than baseline systems.
resources used. To compare performance, the applications at

EAI Endorsed Transactions on

4 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5
Automatic FPGA-based Hardware Accelerator Design: A Case Study with Image Processing Applications

Table 2. Hardwar resources usage for the systems

Platform Applications Type Baseline Ours Energy saved

LUT 9,296 15,227
Canny 49.7%
Reg 12,707 18,865
Embedded
LUT 11,755 20,837
jpeg 66.5%
Reg 11,910 20,900
LUT 74,965 90,789
Canny 54.3%
High-performance Reg 48,994 54,849
computing LUT 86,125 101,980
jpeg 51.8%
Reg 64,716 71,527

Resources usage & energy consumption analysis

Acknowledgements.
We use synthesis reports from the Xilinx tools and XPower
This research is funded by Ho Chi Minh City University of
analyzer also from Xilinx to extract hardware resources
Technology - VNU-HCM under grant number To-KHMT-
usage and power consumption of the systems. We compare
2018-03.
our systems with the baseline systems in terms of resources
utilization and energy consumption saved when using our
systems instead of the baseline ones. References
Table 2 presents the hardware resources utilization for
one FPGA device in different systems with the two [1] C. Pham-Quoc, “Design Framework for FPGA-based
applications. We report the two most important values of Hardware Accelerator with Hybrid Interconnect,” in
resources including Look-up Tables (LUT) and Registers proceedings of 2019 6th NAFOSTED Conference on
(Reg). Please note that, we only show resources directly Information and Computer Science (NICS), December 2019,
used for the processing of our applications, i.e., we do not Hanoi, Vietnams
take resources used for managing the systems into account [2] C. Pham-Quoc, B. Kieu-Do, and T. Ngoc Thinh, “An fpga-
based seed extension ip core for bwa-mem dna alignment,” in
although they exist in the systems like debugger module or
2018 International Conference on Advanced Computing and
I/O handling modules. According to the table, our systems Applications (ACOMP), Nov 2018, pp. 1–6.
in both platforms need more resources than the baseline [3] C. Pham-Quoc, B. Tran-Thanh, and T. N. Thinh, “A scalable
ones for all the experiments. The main reason for this issue fpga-based floating-point gaussian filtering architecture,” in
is the hybrid interconnect used. 2017 International Conference on Advanced Computing and
The table also presents energy reduction that our systems Applications (ACOMP), Nov 2017, pp. 111–116.
offer when compared to the baseline systems. Please note [4] L. Song, J. Mao, Y. Zhuo, X. Qian, H. Li, and Y. Chen,
that the energy consumption is estimated by power “Hypar: Towards hybrid parallelism for deep learning
consumption (reported by Xilinx XPower analyzer) and accelerator array,” in 2019 IEEE International Symposium on
execution time. According to the table, although we need High Performance Computer Architecture (HPCA), Feb
2019, pp. 56–68.
more resources than the baseline systems due to the hybrid [5] M. Torabzadehkashi, S. Rezaei, A. Heydarigorji, H.
interconnect; our systems use less energy than the baseline Bobarshad, V. Alves, and N. Bagherzadeh, “Catalina: In-
ones by up to 66.5%. storage processing acceleration for scalable big data
analytics,” in 2019 27th Euromicro International Conference
on Parallel, Distributed and Network-Based Processing
5. Conclusions (PDP), Feb 2019, pp. 430–437.
[6] C. Pham-Quoc, B. Nguyen, and T. N. Thinh, “Fpga-based
In this paper, we summarize the automatic FPGA-based multicore architecture for integrating multiple ddos defense
hardware accelerator design framework proposed in our mechanisms,” SIGARCH Comput. Archit. News, vol. 44, no.
previous work. We then present in detail the accelerator 4, pp. 14–19, Jan. 2017.
systems for two applications belonging the image processing [7] D. Pnevmatikatos, K. Papadimitriou, T. Becker, P. Baehm, A.
Brokalakis, K. Bruneel, C. Ciobanu, T. Davidson, G.
domain. The case study proves that the framework can help Gaydadjiev, K. Heyse, W. Luk, X. Niu, I. Papaefstathiou, D.
designers reduce the NRE cost and efforts in developing Pau, O. Pell, C. Pilato, M. Santambrogio, D. Sciuto, D.
FPGA-based systems. Moreover, with the support of the Stroobandt, T. Todman, and E. Vansteenkiste, “Faster:
framework, we achieve better overall performance Facilitating analysis and synthesis technologies for effective
compared to both the host processors only and the baseline reconfiguration,” Microprocessors and Microsystems, vol. 39,
systems. We also manage to save up to 66.5% of energy no. 4, pp. 321 – 338, 2015.
consumption when compared to the baseline systems. [8] D. Glick, J. Grigg, B. Nelson, and M. Wirthlin, “Maverick: A
Although more resources are needed by our systems, more standalone cad flow for xilinx 7-series fpgas,” in Proceedings
of the 2019 ACM/SIGDA International Symposium on Field-
energy can be saved. Saving energy is one of the most
critical issues for green computing in this era.

EAI Endorsed Transactions on

5 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5
Cuong Pham-Quoc

Programmable Gate Arrays, ser. FPGA ’19. New York, NY, [11] J. Canny, “A computational approach to edge detection,”
USA: ACM, 2019, pp. 306–307. Pattern Analysis and Machine Intelligence, pp. 679 –698,
[9] D. Sanchez, G. Michelogiannakis, and C. Kozyrakis, “An 1986.
analysis of on-chip interconnection networks for large-scale [12] J. Scott, L. H. Lee, J. Arends, and B. Moyer, “Designing the
chip multiprocessors,” ACM Trans. Archit. Code Optim., vol. lowpower M_CORE architecture,” in IEEE Power Driven
7, no. 1, pp. 4:1–4:28, May 2010. Microarchitecture Workshop, 1998.
[10] C. Pham-Quoc, Z. Al-Ars, and K. Bertels, “Heterogeneous [13] Xilinx, “Ml510 reference design,” 2009
hardware accelerators interconnect: An overview,” in 2013 [14] Micron, “Hybrid core computer,” 2012
NASA/ESA Conference on Adaptive Hardware and Systems
(AHS-2013), June 2013, pp. 189–197.

EAI Endorsed Transactions on

6 Context-aware Systems and Applications
11 2019 - 05 2020 | Volume 7 | Issue 20 | e5

IS-U Data Model
100% (2)
IS-U Data Model
1 page
Dubai List 1
No ratings yet
Dubai List 1
100 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
High Level Programming For FPGA Based Image and Video Processing Using Hardware Skeletons
No ratings yet
High Level Programming For FPGA Based Image and Video Processing Using Hardware Skeletons
8 pages
Technologies-07-00004 - A High-Level Synthesis Implementation and Evaluation of An Image Processing Accelerator
No ratings yet
Technologies-07-00004 - A High-Level Synthesis Implementation and Evaluation of An Image Processing Accelerator
13 pages
Programming Heterogeneous Systems From An Image Processing DSL
No ratings yet
Programming Heterogeneous Systems From An Image Processing DSL
25 pages
Image Hardware PDF
No ratings yet
Image Hardware PDF
19 pages
Applications of Embedded System Using AI A Review - HBRP Publication
No ratings yet
Applications of Embedded System Using AI A Review - HBRP Publication
7 pages
Implementation and Evaluation of Image Processing
No ratings yet
Implementation and Evaluation of Image Processing
27 pages
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
No ratings yet
Fpga Implementation of A License Plate Recognition Soc Using Automatically Generated Streaming Accelerators
8 pages
J2 HWProgramUsingC Microproce Microprogr 94
No ratings yet
J2 HWProgramUsingC Microproce Microprogr 94
5 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
Jimaging 05 00016
No ratings yet
Jimaging 05 00016
22 pages
04 Abstract
No ratings yet
04 Abstract
40 pages
Symbiflow VPR Micro
No ratings yet
Symbiflow VPR Micro
9 pages
Implementation and Optimization of Embedded Image Processing System
No ratings yet
Implementation and Optimization of Embedded Image Processing System
6 pages
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
No ratings yet
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
39 pages
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
Try This
No ratings yet
Try This
32 pages
Design Optimization For High-Performance Computing Using FPGA
No ratings yet
Design Optimization For High-Performance Computing Using FPGA
19 pages
Unveiling The Powerhouses of AI A Comprehensive ST
No ratings yet
Unveiling The Powerhouses of AI A Comprehensive ST
9 pages
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
From Everand
BeagleBone Systems and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
HW-SW-MID 25 MTech Scheme
No ratings yet
HW-SW-MID 25 MTech Scheme
10 pages
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
From Everand
DeepSeek vs. ChatGPT – Why DeepSeek is the Superior AI.
Gary Thatcher
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Chapter-2: Literature Review
No ratings yet
Chapter-2: Literature Review
11 pages
Applications Enabled by FPGA-Based Technology
No ratings yet
Applications Enabled by FPGA-Based Technology
4 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Example Poster
No ratings yet
Example Poster
1 page
Research Platform (Harp) Exploring Opencl On A Cpu-Fpga Heterogeneous Architecture
No ratings yet
Research Platform (Harp) Exploring Opencl On A Cpu-Fpga Heterogeneous Architecture
104 pages
Tapa: A Scalable Task-Parallel Dataflow Programming Framework For Modern Fpgas With Co-Optimization of Hls and Physical Design
No ratings yet
Tapa: A Scalable Task-Parallel Dataflow Programming Framework For Modern Fpgas With Co-Optimization of Hls and Physical Design
31 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Thesis HardBound
No ratings yet
Thesis HardBound
227 pages
Electronics 10 02859 v2
No ratings yet
Electronics 10 02859 v2
16 pages
A Systematic Literature Review On Hardware Implementation of Image Processing
No ratings yet
A Systematic Literature Review On Hardware Implementation of Image Processing
10 pages
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
From Everand
Programming and Prototyping with Teensy Microcontrollers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
From Everand
Study Guide Cisco 300-535 SPAUTO Automating and Programming Cisco Service Provider Solutions
Anand Vemula
No ratings yet
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
From Everand
Jetson Platform Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
No ratings yet
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
11 pages
Implementation of FPGA-based Accelerator For CNN
No ratings yet
Implementation of FPGA-based Accelerator For CNN
7 pages
High Performance FPGA Based CNN Accelerator
No ratings yet
High Performance FPGA Based CNN Accelerator
4 pages
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
No ratings yet
A Reconfigurable CNN-Based Accelerator Design For Fast and Energy-Efficient Object Detection System On Mobile FPGA
8 pages
A Survey and Evaluation of FPGA High-Level Synthesis Tools
No ratings yet
A Survey and Evaluation of FPGA High-Level Synthesis Tools
14 pages
Research and Hardware Design of Image Processing A
No ratings yet
Research and Hardware Design of Image Processing A
8 pages
FPGA Based Hardware Acceleration A CPU - Accelerator Interface Exploration
No ratings yet
FPGA Based Hardware Acceleration A CPU - Accelerator Interface Exploration
4 pages
Image Processing Using VHDL
No ratings yet
Image Processing Using VHDL
36 pages
Embedded Processors On FPGA: Soft Vs Hard: Vivek Jayakrishnan
No ratings yet
Embedded Processors On FPGA: Soft Vs Hard: Vivek Jayakrishnan
8 pages
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
No ratings yet
A Scalable and Efficient Convolutional Neural Network Accelerator Using HLS For A System-On-Chip Design
18 pages
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Report Rahul
No ratings yet
Report Rahul
10 pages
Implementing Image Processing Algorithms On Fpgas: C. T. Johnston, K. T. Gribbon, D. G. Bailey
No ratings yet
Implementing Image Processing Algorithms On Fpgas: C. T. Johnston, K. T. Gribbon, D. G. Bailey
6 pages
A Template-Based Design Methodology For
No ratings yet
A Template-Based Design Methodology For
10 pages
PRGA: An Open-Source FPGA Research and Prototyping Framework
No ratings yet
PRGA: An Open-Source FPGA Research and Prototyping Framework
11 pages
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
From Everand
Pop!_OS System Administration Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering System Programming with C: Files, Processes, and IPC
From Everand
Mastering System Programming with C: Files, Processes, and IPC
Larry Jones
No ratings yet
Sanad Ba Caes
No ratings yet
Sanad Ba Caes
22 pages
Desmo Uliers 2012
No ratings yet
Desmo Uliers 2012
12 pages
Hybrid Parallelization of The Black Hole Algorithm For Systems On Chip
No ratings yet
Hybrid Parallelization of The Black Hole Algorithm For Systems On Chip
15 pages
Exploratory Project Reportdoc
No ratings yet
Exploratory Project Reportdoc
15 pages
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
A Framework For Fpga Design Planning
No ratings yet
A Framework For Fpga Design Planning
4 pages
Tnomura Resume
No ratings yet
Tnomura Resume
2 pages
Ap 5.4 PDF
No ratings yet
Ap 5.4 PDF
4 pages
Chapter 9: Input/Output: The Architecture of Computer Hardware and Systems Software
No ratings yet
Chapter 9: Input/Output: The Architecture of Computer Hardware and Systems Software
40 pages
VM and Docker
No ratings yet
VM and Docker
5 pages
Cec1 Machine Experiment No.1 Roblesflores
No ratings yet
Cec1 Machine Experiment No.1 Roblesflores
4 pages
21 Technical Specification Part V - Instrumentation Works
No ratings yet
21 Technical Specification Part V - Instrumentation Works
28 pages
Pan, Lan, Wan, Man and Sampiling, Coding, Quantization
No ratings yet
Pan, Lan, Wan, Man and Sampiling, Coding, Quantization
7 pages
Ls All PDF
No ratings yet
Ls All PDF
176 pages
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
24 pages
Cyber Mind
No ratings yet
Cyber Mind
8 pages
C264 - Operation Guide
No ratings yet
C264 - Operation Guide
212 pages
3 Hours / 70 Marks: Seat No
No ratings yet
3 Hours / 70 Marks: Seat No
88 pages
Export
100% (1)
Export
26 pages
Installing OrangeFox Recovery - OrangeFox Recovery Wiki
No ratings yet
Installing OrangeFox Recovery - OrangeFox Recovery Wiki
3 pages
Chapter 14
No ratings yet
Chapter 14
68 pages
System and Automatic Control Lab (EC-218) : Experiment No. 9
No ratings yet
System and Automatic Control Lab (EC-218) : Experiment No. 9
4 pages
MDC355 - KR Kandavel - Building Disaster-Recovery Solution Using Azure Site Recovery (ASR) For Hyper-V, Physical and VMware Platforms (Part 2)
No ratings yet
MDC355 - KR Kandavel - Building Disaster-Recovery Solution Using Azure Site Recovery (ASR) For Hyper-V, Physical and VMware Platforms (Part 2)
49 pages
9 .Efficient Design For Fixed Width Adder
No ratings yet
9 .Efficient Design For Fixed Width Adder
45 pages
EPCOS Power Factor Correction Product Profile
No ratings yet
EPCOS Power Factor Correction Product Profile
104 pages
HP Pavilion Dv2000 (965gm) Wistron Pamirs Uma 06228 Rev SB
No ratings yet
HP Pavilion Dv2000 (965gm) Wistron Pamirs Uma 06228 Rev SB
41 pages
Illustrated Parts Catalogue: Plus Plus
No ratings yet
Illustrated Parts Catalogue: Plus Plus
19 pages
Detector CO2 Parcaje
No ratings yet
Detector CO2 Parcaje
93 pages
DX-9100 Extended Digital Controller
100% (1)
DX-9100 Extended Digital Controller
3 pages
Super Premium: February 2 0 1 7
No ratings yet
Super Premium: February 2 0 1 7
2 pages
Medtronic NIM
No ratings yet
Medtronic NIM
10 pages
3 7 3-WebApplicationFirewallDevelopersGuide PDF
No ratings yet
3 7 3-WebApplicationFirewallDevelopersGuide PDF
186 pages
Transformer Design
100% (2)
Transformer Design
47 pages
Book Tex
No ratings yet
Book Tex
393 pages

Eai 12-5-2020 164497

Uploaded by

Eai 12-5-2020 164497

Uploaded by

EAI Endorsed Transactions

on Context-aware Systems and Applications Research Article

Automatic FPGA-based Hardware Accelerator Design: A

Keywords: FPGA-based design framework, Hardware accelerator, Image processing

Received on 10 April 2020, accepted on 10 May 2020, published on 12 May 2020

* Corresponding author: [email protected]

EAI Endorsed Transactions on

3.1. The Canny edge detection

EAI Endorsed Transactions on

8 Bytes 689420 Bytes

EAI Endorsed Transactions on

EAI Endorsed Transactions on

Table 2. Hardwar resources usage for the systems

Platform Applications Type Baseline Ours Energy saved

Resources usage & energy consumption analysis

EAI Endorsed Transactions on

EAI Endorsed Transactions on

You might also like