Microprocessors and Microsystems: Prateek Sikka, Abhijit R. Asati, Chandra Shekhar
Microprocessors and Microsystems: Prateek Sikka, Abhijit R. Asati, Chandra Shekhar
Real time FPGA implementation of a high speed and area optimized Harris
corner detection algorithm
Prateek Sikka *, Abhijit R. Asati, Chandra Shekhar
Electrical and Electronics Engineering Department, Birla Institute of Technology and Science, Vidya Vihar Campus, Pilani, Rajasthan, India
A R T I C L E I N F O A B S T R A C T
Keywords: Harris corner detection is an algorithm frequently used in image processing and computer vision applications to
Hardware description language detect corners in an input image. In most modern applications of image processing, there is a need for real time
Register transfer language implementation of algorithms such as Harris corner detection in hardware systems such as field-programmable
Field-programmable gate array
gate arrays (FPGAs). FPGAs allow faster algorithmic throughput, which is required to match real time speeds or
High-Level Synthesis
Harris corner detection
cases where there is a requirement to process faster data rates. High level synthesis tools offer higher abstraction
Vivado level to designers with continued verification during the design flow and hence are getting popular with the
MATLAB HDL coder design community. This paper proposes a high speed and area optimized implementation of a Harris corner
detection algorithm. The proposed implementation was actualized using a novel high-level synthesis (HLS)
design method based on application-specific bit widths for intermediate data nodes. Register transfer level (RTL)
code was generated using MATLAB HDL coder for HLS. The generated hardware description language (HDL)
code was implemented on Xilinx ZedBoard using Vivado software and verified for functionality in real time with
input video stream. The obtained results are superior to those of previous implementations in terms of area
(smaller gate count on target FPGA) and speed for the same target board.
* Corresponding author.
E-mail addresses: [email protected] (P. Sikka), [email protected] (A.R. Asati), [email protected] (C. Shekhar).
https://doi.org/10.1016/j.micpro.2020.103514
Received 27 July 2020; Received in revised form 11 September 2020; Accepted 20 November 2020
Available online 26 November 2020
0141-9331/© 2020 Elsevier B.V. All rights reserved.
P. Sikka et al. Microprocessors and Microsystems 80 (2021) 103514
modified Harris corner detector for breast cancer detection from MRI re-used across projects and enable verification teams to use
and x-ray images. They used an automated adaptive radius suppression high-abstraction-level modeling techniques such as transaction-level
technique, which reduces corner clustering; thus, avoiding the loss of modeling [12].
useful corners due to over-suppression. John et al. [8] proposed a Furthermore, most contemporary chip systems have embedded
generic image feature extractor algorithm and implemented the same on processors. Additional software or firmware is involved in the design
Cyclone 4 FPGA for real time processing achieving a frame rate of 70 process owing to the co-existence of microprocessors, digital signal
frames per second. Hisham et al. [9] used dynamic partial reconfigu processors (DSPs), memories, and custom logic on a single chip. Hence,
ration to design a self-adaptive system on chip for Harris corner detec an automated HLS process allows designers and architects to experiment
tion algorithm. Their implementation dissipated less power with a small with different algorithmic and implementation choices to explore
overhead on performance. various area, power, and performance tradeoffs from a common func
This paper proposes a real time implementation of the Harris corner tional specification.
detector on a Xilinx ZedBoard and demonstrate that the implementation Accordingly, the industrial deployment of HLS tools has become
is superior in terms of speed and area usage on the FPGA to previous more practical with improvements in register transfer level (RTL) syn
implementations. The design was developed using a novel high-level thesis tools.. Proprietary tools have been built by major semiconductor
design method that synthesizes the design with intermediate signal design houses, including IBM [13], Motorola [14], Philips [15], and
widths constrained according to the application (input stimulus). The Siemens [16]. Major Electronic Design Automation (EDA) vendors have
remainder of this paper is organized as follows: Section 2 presents an also begun to commercialize different HLS tools. In 1995, for instance,
introduction to high-level synthesis. Section 3 explains the architecture Synopsys introduced the “Behavioral Compiler” tool [17], which gen
of the Harris corner detector. Section 4 explains the methodology used in erates RTL implementations from behavioral hardware description
the proposed design. Section 5 presents simulation and synthesis results, language (HDL) code and connects to downstream tools. Similar tools
as well as a comparison with the results presented by other researchers. include “Catapult HLS” from Mentor Graphics [18] and “Stratus High
Section 6 presents concluding remarks. Level Synthesis” from Cadence [19]. A typical flow for HLS in VLSI
designs is shown in Fig. 1.
2. High-level synthesis
3. Harris corner detector
High-level synthesis (HLS) is gaining momentum as a methodology
that can ensure continued verification throughout the design cycle while In images, a corner is a point where both gradients of the orthogonal
allowing designers to describe design behaviors at high abstraction axes are high. The Harris corner detection algorithm detects points in
levels. HLS tools include Vivado HLS [10] and MATLAB HDL Coder [11] the image where these conditions are met. The algorithm takes a small
as well as several open-source tools. Digital designers and architects region of an image and determines whether the window contains corner
typically use these to design and deploy algorithms that target varied features. Assume I(x,y) is the image point; the gradient matrix M can be
aerospace, communications, image processing, deep learning, and calculated using Eq. (1) as follows:
neural network applications. HLS tools help reduce code complexity by
factors ranging from seven to ten. They allow for behavioral IP to be
2
P. Sikka et al. Microprocessors and Microsystems 80 (2021) 103514
where i and j are the pixel indices of a window of range W. If the ei balance the delay of the actual cycle accurate hardware implementation
genvalues of matrix M are high, it is a corner feature point. Finding the running on the FPGA.
eigenvalue is computationally intensive; thus, an approximation for Fig. 4 displays the input image, the output image from the MATLAB
mula is used instead, as presented in Eq. (2). model, and the output from the HDL implementation on the FPGA. As
described in the same figure, the input image source, the MATLAB
R = Det(M) − k ∗ Trace(M)2 (2) behavioral model, and the FPGA HDL model ran independently and
processed different frames from the input video stream..
where the value of k is approximately 0.04–0.06. If R is larger than the The optimum datatype and data widths selected for the model in RTL
threshold, the center pixel in the window is a corner feature candidate. was back-annotated to the HLS model. The back-annotated datatypes for
By scanning the windows at different positions of the entire image, all an intermediate stage (gradient calculation) in the model are shown in
feature candidates can be found. The Harris algorithm comprises five Fig. 5. In the highlighted example, two 18-bit signed fixed-point
primary steps: gradient calculation, Gaussian smoothing, Harris value numbers produced a 36-bit result. In the pessimistic data width
calculation, thresholding, and non-maxima suppression. Fig. 2 depicts model, all widths were 32-bit, which is suboptimal.
the cascaded stages of the Harris corner detection algorithm. Although this method was used for a MATLAB HDL coder-based
design for Harris corner detection, the generic methodology is
4. Design methodology compatible for all HLS tools and all FPGA applications. This method is
easily scalable for any number of inputs, outputs, and internal signals
In this study, a cascaded model of the Harris corner detector was that the design may contain. For this study’s case, the method of
created in MATLAB using input video frames of resolution 240 (width) x calculation of minima and maxima for each node was fully automated
320 (Height) x 3 (Colors) with eight bits representing each color. and did not require any manual intervention. Even though we chose 8
Because HDL implementation uses pixels instead of frames, the input bits to represent each color, the optimization method is independent of
frames were converted to pixels and each pixel was sent as an input to that. This means we could choose any other datatype and any frame rate
the design per clock cycle. A Simulink library block was also concur for input image. Furthermore, all stages of the algorithm depicted in
rently used for the Harris corner detector, with the Simulink library Fig. 2 are fully pipelined. This leads to a better design throughput at the
block obtaining the same input images, and the results were subse cost of a slightly increased number of flip-flops.
quently compared against the hardware implementation.
During the simulation time of 100 s-the duration of the video stream, 5. Results and discussion
the absolute minima and maxima for the inputs, outputs, and interme
diate nodes in the design were recorded. This database of minima and 5.1. FPGA implementation
maxima was appended on subsequent simulation runs until all varied
video signal inputs were covered. Subsequently, the width for all the The RTL code from the HLS model was generated using MATLAB
signals was recalculated based on the range of the values for each input, HDL coder with stimulus-aware bit widths. The RTL code was synthe
output, and the intermediate nodes. The HLS tool was then constrained sized using Vivado 2019.1 software and implemented on the Xilinx
to generate the RTL with the updated constraints for all the data nodes ZedBoard.
along with the primary inputs and outputs. The optimized RTL was then
3
P. Sikka et al. Microprocessors and Microsystems 80 (2021) 103514
Fig. 4. Input image, output image from model, output image from HDL(FPG).
4
P. Sikka et al. Microprocessors and Microsystems 80 (2021) 103514
5.3. Synthesis results This paper proposed a high speed and optimal (low) area imple
mentation of the Harris corner detection algorithm that is suitable for
The proposed Harris corner detector design was created using opti real time deployment on FPGAs. The design was actualized using a novel
mum bit-width data types, as explained in Section 4, and simulated on HLS design method that constrains the intermediate nodes and primary
XSim, as mentioned in Section 5.2. The same Verilog design was syn ports of the design in accordance with the absolute minima and maxima
thesized using Vivado 2019.1 targeting the Xilinx ZedBoard. Table 1 for each node. The generated RTL for the algorithm was implemented on
presents the synthesis and implementation results for the proposed a Xilinx ZedBoard using real time video stream as the input with a res
design. olution of 240 × 320 and 8-bit color inputs. The RTL design was func
The total power reported by the Vivado synthesis tool was 0.315 W, tionally verified using Vivado xSim simulator from Xilinx. We observed
comprising 0.205 W dynamic power and 0.110 W static power. In that the quantization errors introduced in the implementation using our
addition, high-level synthesis tools also offer many additional optimi HLS method was less than 1%. The synthesis results suggest that our
zation techniques such as resource sharing and pipelining. These can be implementation performs better than similar extant architectures.
used to further improve the above results; however, such optimization is Hence, this implementation is well-suited for applications on FPGAs that
beyond the scope of this study. require real time image processing. Even though this was verified for
MATLAB HDL coder workflow and targeting Xilinx Zedboard, the
method is applicable to other tools and other FPGA targets as well
5.4. Comparison results
(technology independent). Even though we do not have results, by vir
tue of design method, we also believe that this methodology would give
The synthesis results of the proposed algorithm were compared to
better results with ASIC synthesis as well. Further work will involve
other algorithms reported in the literature [1,2] along with a reference
improvement of the speed, area, and power usage by using optimization
design created by us without using the proposed methodology. Table 2
methodologies offered by high-level synthesis tool vendors such as
summarizes the results of this comparison. As presented in Table 2, the
MathWorks, Xilinx, Mentor, and Cadence in addition to the proposed
implementation proposed in this study is better in terms of frame rate
implementation method. We also plan to extend this study for ASIC
and area usage for the same target FPGA than other implementations.
designs in future.
With regard to area, even with the assumption that a Look Up
Table (LUT) uses two Application Specific Integrated Circuit (ASIC)
Funding
Gates, our implementation used almost 50% less resources than
Komorkiewicz et al., 30% less resources than Liu et al., and 20% less
This research did not receive any grants from funding agencies in the
resources than Chao et al. Furthermore, regarding the frame rate, our
public, commercial, or not-for-profit sector.
implementation has a better performance than other implementations
owing to reduced design complexity. This is because critical paths are
shorter and hence a better operational frequency is seen on the FPGA.
Other authors have not shared power dissipation data; however, the
5
P. Sikka et al. Microprocessors and Microsystems 80 (2021) 103514
Table 2
Comparison of the results for the Harris corner detector.
Resource Proposed Implementation Reference 3 Komorkiewicz et al. Reference 4 Liu et al. Reference 6Chao et al.
Geolocation information speed algorithms, in: Proceedings of the European Conference on Design
Automation, IEEE Computer Society Press, Amsterdam, 1991, pp. 436–441.
[16] J. Biesenack, M. Koster, A. Langmaier, S. Ledeux, S. Marz, M. Payer, M. Pilsl,
India. S. Rumler, H. Soukup, N. Wehn, P. Duzy, The Siemens high-level synthesis system
CALLAS, IEEE Trans. Very Large Scale Integr. Syst. 1 (1993) 244–253. https://doi.
org/10.1109/92.238438.
Declaration of Competing Interest [17] D.W. Knapp, Behavioral Synthesis: digital System Design Using the Synopsys
Behavioral Compiler, Prentice-Hall, Englewood Cliffs, NJ, 1996.
[18] Catapult H.L.S., (2019) https://www.mentor.com/hls-lp/catapult-high-level-synth
None. esis/.
[19] Stratus H.L.S., (2019) https://www.cadence.com/content/cadence-www/global
/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level
Acknowledgments
-synthesis.html.
[20] MathWorks HDL Verifier, (2019) https://in.mathworks.com/products/hdl-verifier.
None. html.
References
Prateek Sikka is currently pursuing his doctorate at BITS
Pilani, Rajasthan, India while working for NXP Semiconductors
[1] V.H. Schulz, F.G. Bombardelli, E. Todt, A Harris corner detector implementation in
in Noida, Uttar Pradesh, India. He has over 13 years of expe
SoC-FPGA for visual SLAM, in: F. Santos Osório, R. Sales Gonçalves (Eds.),
rience serving in multiple EDA and semiconductor organiza
Robotics. SBR 2016, LARS 2016. Communications in Computer and Information
tions, such as STMicroelectronics, Cadence, and Mentor
Science 619, Springer, Cham., 2016.
Graphics in both R&D and customer facing roles. His-research
[2] C. Cabani, Implementation of an Affine-Invariant Feature Detector in Field-
areas include digital systems design and optimization, FPGA
Programmable Gate Arrays, University of Toronto, 2006, pp. 5–13.
prototyping and emulation, and high-level synthesis. He holds
[3] M. Komorkiewicz, T. Kryjak, K. Chuchacz-Kowalczyk, P. Skruch, M. Gorgoń, FPGA
five patents in the same field and has presented at multiple
based system for real time structure from motion computation, in: 2015 Conference
conferences. He holds a Bachelor of Engineering degree from
on Design and Architectures for Signal and Image Processing (DASIP), Krakow,
Thapar Institute Patiala and a Master of Technology degree in
2015, pp. 1–7, https://doi.org/10.1109/DASIP.2015.7367241.
Integrated Electronics from IIT, Delhi.
[4] S. Liu, Real time implementation of Harris corner detection system based on FPGA,
in: 2017 IEEE International Conference on Real time Computing and Robotics
(RCAR), Okinawa, 2017, pp. 339–343, https://doi.org/10.1109/
RCAR.2017.8311884.
[5] C. Xu, B. Yunshan, Implementation of Harris corner matching based on FPGA, in: Dr. Abhijit Asati is a senior member at IEEE and is currently an
2017 6th International Conference on Energy and Environmental Protection associate professor in the Electronics and Electrical Engineer
(ICEEP 2017)., Atlantis Press, 2017. ing department at BITS, Pilani, Rajasthan, India. He completed
[6] T.L. Chao, H.W. Kin, An efficient FPGA implementation of the Harris corner feature his doctorate in 2010, his Master of Engineering degree in 2002
detector, in: 2015 14th IAPR International Conference on Machine Vision from BITS Pilani, and his Bachelor of Engineering in 1996 from
Applications (MVA)., IEEE, 2015. Amravati University. He has taught several courses in electrical
[7] C.Y. Lee, H.J. Wang, C.M. Chen, C.C. Chuang, Y.C. Chang, N.S. Chou, A modified and electronics engineering and has supervised three doctoral
Harris corner detection for breast IR image, Math. Probl. Eng. 2014 (2014). and 25 bachelors’ and masters’ theses. He taught at VNIT
[8] J. Vourvoulakis, J. Kalomiros, J. Lygouras, Fully pipelined FPGA-based Nagpur from 1997 to 1999. He has contributed to over 30
architecture for real-time SIFT extraction, Microprocess. Microsyst. 40 (2016) journal and conference papers in the fields of microelectronics,
53–73. VLSI design, and embedded systems.
[9] H. Ahmed, O. Sidek, An energy-aware self-adaptive System-on-Chip architecture
for real-time Harris corner detection with multi-resolution support, Microprocess.
Microsyst. 49 (2017) 164–178.
[10] Xilinx, (2018) Vivado design suite: high-level synthesis. https://www.xilinx.com/
support/documentation/sw_manuals/xilinx2018_3/ug902-vivado-high-level-
synthesis.pdf (accessed 12 Sep, 2019). Dr. Chandra Shekhar currently serves as Professor Emeritus at
[11] MathWorks HDL coder. (2019) https://www.mathworks.com/products/hdl-coder. BITS, Pilani, Rajasthan, India. After attaining his PhD, he
html, (accessed 14 Aug, 2019). joined the Solid State Devices Division at CEERI, Pilani in 1977
[12] J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, Z. Zhang, High-level as a scientist. Having worked for IC Design related activities, he
synthesis for FPGAs: from prototyping to deployment, IEEE T. Comput. Aid. D 30 also served as the director at CEERI, Pilani. He was awarded the
(2011) 473–491. https://doi.org/10.1109/TCAD.2011.211059. UNESCO/ROSTSCA Young Scientist Award in 1986, the CEERI
[13] R.A. Bergamaschi, R.A. O’Connor, L. Stok, M.Z. Moricz, S. Prakash, A. Kuehlmann, Foundation Day Merit award in 1988, the ISHEER Science
D.S. Rao, High-level synthesis in an industrial environment, IBM J. Res. Develop. Councilor Award in 2005, the Prof. L K Maheshwari Founda
39 (1995) 131–148. https://doi.org/10.1147/rd.391.0131. tion Distinguished Alumnus Award in 2010, a Doctor of Science
[14] K. Kucukcakar, C.T. Chen, J. Gong, W. Philipsen, T.E. Tkacik, Matisse: an (honoris causa) by NIT Kurukshetra in 2012, and the IETE
architectural design tool for commodity ICs, IEEE DesTest Comput. 15 (1998) Diamond Jubilee Gold Medal award in 2013.
22–33. https://doi.org/10.1109/54.679205.
[15] P.E.R. Lippens, J.L. van Meerbergen, A. van der Werf, W.F.J. Verhaegh, B.
T. McSweeney, J.O. Huisken, O.P. McArdle, PHIDEO: a silicon compiler for high