04_abstract (1)
04_abstract (1)
Advancements in image and video processing are growing over the years for
and video processing is the execution of complex functions and high computational tasks.
To overcome this issue, a hardware acceleration of different filter algorithms for both
image and video processing is implemented on Xilinx Zynq®-7000 System on-Chip (SoC)
perform with the help of software libraries using Vivado® High-Level Synthesis (HLS).
There are a few reasons why tracking is preferable over detecting objects in
each frame. Tracking facilitates in identifying the identity of various items across frames
when there are several objects. Object detection may fail in some instances, but tracking
may still be achievable which takes into account the location and appearance of the
object in the previous frame. The key hurdles in real-time image and video processing
applications are object tracking and motion detection. Some tracking algorithms are
extremely fast because they perform a local search rather than a global search. Tracking
algorithms such as mean- shift, multiple hypothesis tracking (MHT), probabilistic data
association, particle filter, nearest neighbor, Kalman filter and interactive multiple
model (IMM) are available to estimate and predict the state of a system.
For linear models, Kalman filter is the most widely used prediction algorithm
as it is very simple, efficient and easy to implement. However, these types of filter
Arrays (FPGAs) and Graphic Processing Units (GPUs) to achieve design requirements
for embedded applications. The research work also proposed a multi-dimensional Kalman
with FPGAs in various domains. FPGAs with their reconfigurable architectures provide
PYNQ-Z2 SoC that will be suitable for real-time object detection. The proposed work
shows better resource utilization of about 174 (79%) DSPs, , 115 (82.17% ) BRAMs,
45.8k (43.04%) of Flip-flops and 23.2k (43.6%) of Look-up tables (LUTs) at 100 MHz
frequency with better performance of 9.14 GOP/s which is 2x more efficient on comparing
INTRODUCTION
Most current FPGAs have multi-core hard processors, GPUs and different I/O
ports that may interface with the low-level architectures in the FPGA fabric. As a result,
current System-on-Chip (SoC) FPGAs are more powerful than ever before for processing
high-end applications. Based on the programming technology, FPGAs can be classified
into three categories mainly SRAM based, flash based and anti-fuse FPGAs. The
comparison of these FPGAs in terms of various parameters is shown in Table 1.
Table 1.2 lists the power consumption analysis between SRAM-based and
flash- based FPGAs. SRAM-based FPGAs have the limitation of consuming larger static
power when compared with flash-based FPGAs. Apart from this limitation, SRAM-based
are widely used for their significant performance and their ability to reprogram. Hence,
these types of FPGAs provide support for the integration of embedded systems for various
applications. The major vendors of FPGAs and SoCs are Xilinx® and Intel®. Some of the
target applications of prominent FPGAs from Xilinx® are shown in Figure 1.2.
Since the research is focused on image processing applications, PYNQ-Z2 SoC is chosen
as primary hardware.
PYNQ-Z2 APSoCs are exclusive and typical from all other Xilinx FPGA
families. It is built with a dual-core ARM Cortex-A9 Processing System (PS), Advanced
Microcontroller Bus Architecture (AMBA) Interconnects and a variety of peripheral
devices including a USB JTAG interface, Quad SPI flash memory, UART, CAN and
Ethernet as well as Xilinx Programmable Logic (PL) of Artix 7-series [2]. Figure 1.4
shows the schematic view of Xilinx SoC device. Figure 1.5 gives the overview structure of
FPGA hardware which is chosen as primary hardware for proposed design to meet
prerequisites. The significant features of PYNQ-Z2 SoC are listed below [3].
Memory
Support 32 data width
IIC - 1 KB EEPROM
16MB Quad SPI Flash
DDR3 Component Memory 1GB
Configuration
USB JTAG configuration port (Digilent)
16MB Quad SPI Flash
Communication
USB OTG 1 (PS) - Host USB
USB UART (PS)
IIC Bus Headers/HUB (PS)
CLB
Look-up tables (LUTs)
Adders
Flip-flops (FFs)
DSP blocks
48-bit adder/accumulator
18 x 25 signed multiply
25-bit pre-adder
Application Processor Unit (APU)
CoreSight™ and Program Trace Macrocell (PTM)
NEON™ media-processing engine
Coherent multiprocessor support
Vector Floating Point Unit (VFPU)
CPU frequency: Up to 1 GHz
Clocking
156.25MHz I2C Programmable Oscillator (Differential LVDS)
200MHz Fixed PL Oscillator (Differential LVDS)
33.33MHz Fixed PS System Oscillator (Single-Ended CMOS)
Secure Digital (SD) connector
HDMI codec
Status LEDs
SoC PS Reset
Pushbuttons IIC bus
multiplexing
FMC1 LPC connector
ADV7511 HDMI codec
RTC-8564JE real time clock
PMBUS data/clock
FMC2 LPC connector
M24C08 EEPROM (1 kB)
Interfacing PS to PL
2x AXI 32b Master 2x AXI 32-bit Slave
4x AXI 64-bit/32-bit Memory
16 Interrupts
AXI 64-bit ACP
8 DMA Channels
L1 cache (32 kB)
On-chip
memory (256 kB) L2 cache
(512 kB)
Security
AES and SHA
RSA Authentication
Figure 1.4. Schematic View of PL and PS Portions of ZYNQ Z2-
Vivado Design Suite, created by Xilinx, is used for HDL design synthesis and
analysis [4]. Vivado is an IDE that allows users to create low-level hardware designs for
Xilinx FPGAs. This suite includes a plethora of Xilinx-developed intellectual property (IP)
that may be included into designs to minimise development time. Users can also create
their own HDL-based IP for application modification with Vivado. The hardware designs
can be developed as a set of HDL files that are linked together, or by utilising the built-in
block diagram GUI, which allows users to drop in IP blocks and manually connect signal
in Vivado. When a design is finished, Vivado can output a bitstream file that can be used to
configure the FPGA.
The Vivado 2020.1 SDK tool was used for this research work to build high-
level software designs that operate on the FPGA processors and interface with the
hardware design in the FPGA fabric. These software designs are in charge of retrieving
parameter and frame data from the FPGA's I/O ports and writing it to BRAM. The SDK
included a graphical user interface (GUI) for developing applications directly on the
MicroBlaze® soft-processor found in ZC702 FPGAs and the dual-core ARM Cortex-A9
CPU. It differs from the standard Eclipse IDE in that it can import Vivado-generated
hardware designs, create and configure Board Support Packages (BSPs), support single-
processor and multi-processor development for FPGA-based software applications, and
includes off-the-shelf software reference designs that can be used to test the applications
hardware and software functionality.
FPGAs are often considered as first option towards true hardware acceleration
since they have a reconfigurable fabric that can express a software programme as logic
gates. The trade-off between flexibility, performance, and power consumption is constantly
examined when considering hardware platforms to accelerate domain-specific applications.
FPGAs, on the other hand, fall somewhere in between the two and provide a good balance
between these three measures [5].
FPGA based hardware acceleration for image and video processing techniques
provide high performance and parallelism. The major concern in implementation process is
that effective utilization of hardware resources such as BRAMs, DSP slices, LUTs, FFs and
PLBs. The challenging task in real-time image processing is tracking of multiple objects.
For object tracking algorithms, accuracy and speed are considered as the primary parameters
for evaluation and validation. CNN-based tracking algorithms are not time efficient, and
feature extraction involves a multi-layer network to perform the operation. These
characteristics prompted the researchers to choose FPGAs to implement tracking and
prediction.
To accelerate FPGA based CNN acceleration with minimum prediction time when
compared with other hardware accelerations.
CHAPTER 3
CNN uses Darknet-53 framework for object detection which is fast and reliable
state-of-art algorithm [62]. As the name suggests, Darknet has 53 convolutional layers.
CNN v4 is faster than its previous version CNNv3. CNN consists of Darknet as its
backbone and CNNv3 at top of it. The middle layer includes Path Aggregation Network
(PAN) [60], Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (FPN) [61]. The
generic architecture for CNN is shown in Figure 3.1. To extract features from images, it
uses convolutional layers and prediction of bounding boxes and regression from anchor
boxes with k x k kernel size to generate feature map. It also consists of pooling layers
which is for down sampling each input map. A fully connected layer (FC) can perform as
classifiers. There are some non-linearity layers to enhance the fitting ability of neural
networks which has activation functions. Most commonly used are Rectified Linear Unit
(ReLU), Leaky ReLU, sigmoid and hyperbolic tangent function. The important
measurements for CNN object detection are discussed in section 3.3.
3.2 DATASETS
In this work, both CNN network is used for object detection on MS-COCO
benchmark dataset which has more number of categories and instances than PASCAL
VOC and ImageNet [101]. MS-COCO contains images with 91 different types of objects
over 2.5 million annotated illustrations. On comparing with ImageNet benchmark datasets
and PASCAL VOC, MS-COCO dataset contains over 10% of single category per image.
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝐹𝑁+𝑇𝑃 (
3.2)
𝐼𝑜𝑈 =
𝑇𝑃
𝐹𝑃+𝑇𝑃+𝐹𝑁 (
3.3)
𝐼𝑜𝑈𝑝𝑟𝑒𝑑
= ∩ 𝐴𝑟𝑒𝑎𝑔𝑡 = =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑂𝑣𝑒𝑟𝑙𝑎𝑝 𝐼 ( 𝑋)
𝐴𝑟𝑒𝑎 (
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈(
3.4)
𝐴𝑟𝑒𝑎𝑝𝑟𝑒𝑑 ∪ 𝐴𝑟𝑒𝑎𝑔𝑡
𝑈𝑛𝑖𝑜𝑛 𝑋)
𝑚 𝑇𝑃
𝐴𝑃 =
1 ∑
𝑐∈𝐶𝑙𝑎𝑠𝑠𝑒𝑠 (3.5)
|𝑐𝑙𝑎𝑠𝑠𝑒𝑠| 𝑇𝑃+𝐹𝑃
CNN network operates on the horizontal bounding box to locate the position of
the target images is shown in Figure 3.3. Normally, dimensional vectors for bounding box
are denoted by bx, by, bw and bh where (bx, by) represents the centre of the bounding box,
and bw and bh are width and height of bounding box, are given in Equations (3.6), (3.7),
(3.8) and (3.9) respectively. The dimensions of the predefined box are given by pw and ph.
For locating bounding boxes, Greedy non-maximum suppression (NMS) is used. The
activation function is denoted as fa. (Cx, Cy) are the coordinates of the top left corner of the
anchor box.
𝑏𝑥 = 𝑓𝑎(𝑡𝑥) + 𝐶𝑥 (3.6)
𝑏𝑦 = 𝑓𝑎(𝑡𝑦) + 𝐶𝑦 (3.7)
𝑏𝑤 = 𝑝𝑤 . 𝑒𝑡𝑤 (3.8)
𝑏ℎ = 𝑝ℎ . 𝑒𝑡ℎ (3.9)
The width and height of box is given by w and h respectively. The number of
cells and number of classes are denoted as S2 and C respectively. B is number of bounding
boxes predicted by each grid. The probability is given by p. 𝜆𝑛𝑜𝑜𝑏𝑗 and 𝜆𝑐𝑜𝑜𝑟𝑑 are the loss
parameters to control the stability.
Algorithm: Bounding Box
Inputs: Coordinates: x and y. Dimensions: img_w-width and img_h-height.
Output: Draw boxes over predicted region
Step-1: Get output layers, Initialization-1: Set confidence threshold value to
0.5 and non- maximum suppression(greedy) value to 0.4
1. def get_output_layers(darknet)
2. output_layers = [layer_names[i]
3. for i in darknet.getUnconnectedOutLayers()]
4. Initialize begin
5. outs = net.forward(get_output_layers(net))
6. Calculate number of classes, class_ids = [n]
7. Calculate number of confidences for each layers, confid = [k]
8. confid_thres = 0.5; nms_thres = 0.4
9. Neglecting the layers with confidence threshold < 0.5
10. if box_classes[i]>0.5
11. center_x = int(detection[i(0)] * img_w); center_y = int(detection[i(1)] * img_h)
12. Estimation of bounding box parameters:
13. x = center_x - w / 2
14. w = int(detection[i(2)] * img_w)
15. y = center_y - h / 2
16. h = int(detection[i(3)] * img_h)
17. class_ids.append(class_id)
18. confid.append(float(confid))
19. boxes.append([x, y, img_w, img_h])
20. Non-maximum Suppression(NMS)
21. indices = cv2.dnn.NMSBoxes(boxes, confidences,
confid_threshold, greedynms_threshold)
22. To draw bounding box,
23. draw_bounding_box(img, class_ids[n], anchors[i], confidences[k], round[x, y, x+
img_w]
24. Display output image, cv2.imshow("object detection", out_img)
25. end if
26. end
3.4 PROPOSED ACCELERATION OF CNN ALGORITHM
� 𝐷�
𝑛 = 𝐷 + 𝐷
𝑛 𝑛
(3.15)
� �
However, because the queue’s size is finite, the delay in reaching its maximum
queue size occurs. However, because the proposed accelerator serves the most recent
frame, this cumulative delay does not occur, demonstrating that each frame can be handled
inside the deadline. In the context of the CNN algorithm, the waiting time for the frames
kept in the queue is compounded as time passes. For example, when Ta is larger than or
equal to Ds, the system is often operating in a high-performance hardware system, and the
object detection service time is faster than the input rate.
Figure 3.9. IP Core Subsystems of Accelerated CNN Algorithm Using Vivado
2020.1
Table 3.1. Comparison of Resources Utilized for CNN on Zynq XC7Z020 Platform.
From Table 3.1, it is understood that LUTs and DSPs are effectively utilized
for the proposed implementation but FFs and BRAMs are utilized higher because of huge
memory occupied for high resolution images in MS-COCO dataset [101]. This limitation
can be avoided by optimizing the convolutional layers in the algorithm for high resolution
images. The existing implementations on different platforms for object detection is
illustrated in Table 3.2. As this chapter focused on the real-time implementation,
optimizing these layers are not attempted.
Table 3.2. Overview of Resources Utilized For Object Detection on Different Platforms in Existing Works.
Clock
Target
Parameters Target networks LUTs Flip Flops BRAMs DSPs Freq. Power (W)
Platforms
(MHz)
XC7Z020 27k 24k 68 198 100 NA
Angel-Eye
K. Guo et al. [78], (2016) (only for face
XC7Z 182.6K 127.6k 486 780 150 9.63
detection)
7045
Z. Yu et al. [77], (2020) CNNv3 tiny Zedboard 25.9k 43.7k 185 160 663.7 3.36
7035
3.6 SUMMARY
Vivado HLS tool is used for the synthesis and implementation of object
detection algorithm on SoC which achieved better resource utilization of 43.6% of LUTs,
43.04% of Flip-flops, 82.17% BRAMs and 79% DSPs while comparing with other
hardware implementation. The real-time acceleration of CNN algorithm for object
detection has taken 10.125 ms for detection. The results prove that the performance is 2x
more efficient than previous works and the prediction time is also very low when CNN
algorithm is implemented on FPGA hardware.
CHAPTER 4
PERSPECTIVES
4.1 CONCLUSION
Further the research is fueled by the application of deep learning in the image
processing applications. Among several deep learning based object detections algorithms,
CNN algorithm is chosen and acceleration was attempted. In addition, a hardware based
neural network for object detection was designed. The model is trained on MS-COCO
benchmark dataset and outcomes are compared with existing implementations. For
proposed acceleration of real-time detection, the prediction time is about 10.12 ms which is
faster than existing SoC accelerations.
The research work can be extended with the intention to incorporate full
reconfiguration or partial reconfiguration (PR) which are the main features for FPGAs. The
most difficult problems are programming for reconfigurable architectures and effective
virtualization of FPGA resources for PR. With the development of FPGA technology, it is
possible to implement reconfigurability for real-time image processing applications.
Modern FPGAs have greater capacity and faster memory speeds than in the
past, allowing for more design space. In our research, we discovered that there may be a
performance difference of up to 95% between two different solutions that use the identical
logic resource of an FPGA. It is not trivial to settle with one optimal solution, particularly
when the computation resource and memory bandwidth of an FPGA platform are taken
into account. Therefore, if an accelerator structure is not designed properly, its compute
performance will be insufficient to satisfy the memory band-width requirements enabled
by FPGAs. It denotes that performance has suffered as a result of insufficient usage of
either logic resources or memory bandwidth.
[1] S.A. Fahmy, K. Vipin, FPGA dynamic and partial reconfiguration: A survey of
architectures, methods, and applications. Comput. Surveys 51, pp. 1-39, 2018.
[2] Xilinx, ZC702 Evaluation Board for the PYNQ-Z2 XC7Z020 SoC: User Guide
(2017).[Online].https://www.xilinx.com/support/documentation/boards_and_kit
s/ zc702_zvik/ug850-zc702-eval-bd.pdf. (Accessed on 23rd March, 2022)
[3] Xilinx Inc.: PYNQ-Z2 all programmable SoC technical reference manual. (2021).
(Accessed on 23rd March, 2022) Available
at: https://www.xilinx.com/support/documentation/user_guides/ug585-PYNQ-
Z2-
TRM.pdf
[4] Xilinx Inc, “Vivado Design Suite tutorial high level synthesis, UG871 (v 2014.1)
May 6, 2014,” UG871 (v 2014.1) May 6, 2014. [Online]. Available at:
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/
ug871- vivado-high-level-synthesis-tutorial.pdf. (Accessed on 23rd March, 2022)
[15] I. Kuon, R. Tessier and J. Rose, "FPGA Architecture: Survey and Challenges", J.
Found. and Trends in Electronic Design Automation, vol. 2, no. 2, pp. 135-253,
2008.
[20] D. G. Bailey, “Image processing using FPGAs,” Journal of Imaging, vol. 5, no. 53,
pp. 1–4, 2019.
[22] A. B. Amara, E. Pissaloux and M. Atri, “Sobel edge detection system design and
integration on an FPGA based HD video streaming architecture,” in 11th Int.
Design & Test Sym. (IDT), Hammamet, Tunisia, pp. 160–164, 2016.
[23] E. Onat, “FPGA implementation of real time video signal processing using Sobel,
Robert, Prewitt and Laplacian filters,” in 25th Signal Processing and
Communications Applications Conf. (SIU), Antalya, Turkey, pp. 1–4, 2017.
[24] R. Tessier, I. Kuon, J. Rose, “FPGA architecture: survey and challenges,” Found.
Trends Electron. Des. Autom. 2(2), pp. 135–253, 2008.
[25] Y. Fang, L. Yu and S. Fei, "An Improved Moving Tracking Algorithm With Multiple
Information Fusion Based on 3D Sensors," in IEEE Access, vol. 8, pp. 142295-
142302, 2020, doi: 10.1109/ACCESS.2020.3008435.
[26] Y. Wang and X. Mu, "Dynamic Siamese Network With Adaptive Kalman Filter for
Object Tracking in Complex Scenes," in IEEE Access, vol. 8, pp. 222918-222930,
2020, doi: 10.1109/ACCESS.2020.3043878.
[27] S. Yang and M. Baum, "Extended Kalman filter for extended object tracking,"
2017 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2017, pp. 4386-4390, doi: 10.1109/ICASSP.2017.7952985.
[28] C. G. Prevost, A. Desbiens and E. Gagnon, "Extended Kalman Filter for State
Estimation and Trajectory Prediction of a Moving Object Detected by an
Unmanned Aerial Vehicle," 2007 American Control Conference, 2007, pp. 1805-
1810, doi: 10.1109/ACC.2007.4282823.
[29] I.A. Iswanto, T. Choa, B. Li, “Object tracking based on meanshift and particle-
Kalman filter algorithm with multi features,” Procedia Computer Science, vol.
157, pp. 521–529, 2019.
[30] F. Farahi, H.S. Yazdi, “Probabilistic Kalman filter for moving object tracking,”
Signal Processing: Image Communication, vol. 82 no. 10, pp.115751, 2020.
[31] E. Gundogdu, A. A. Alatan, “Good features to correlate for visual tracking,” IEEE
Transactions on Image Processing, vol. 27, no. 5, pp. 2526–2540, 2018.
[34] J.-M. Jeong, T.-S. Yoon, J.-B. Park, Kalman filter based multiple objects detection-
tracking algorithm robust to occlusion, in: 2014 Proceedings of the SICE Annual
Conference (SICE), 2014, pp. 941–
946, https://doi.org/10.1109/SICE.2014.6935235.
[36] Z. Zhou, X. Gao, J. Xia, Z. Zhu, D. Yang, J. Quan, “Multiple instance learning
tracking based on Fisher linear discriminant with incorporated priors,” Int. J. Adv.
Robotic Syst. vol. 15(1), pp. 1–19, 2018. doi:
https://doi.org/10.1177/1729881417750724.
[37] Z. Zhou, J. Wang, Y. Wang, Z. Zhu, J. Du, X. Liu, J. Quan, “Visual tracking using
improved multiple instance learning with co-training framework for moving
robot,” KSII Trans. Internet Inf. Syst. vol. 12 (11), pp. 5496–5521, 2018.
[38] E. Gundogdu, A.A.Alatan, Good features to correlate for visual tracking, IEEE
Trans. Image Process. 27(5) (2018) 2526–2540. doi:10.1109/TIP.2018.2806280..
[39] J. S. Bergstra, R. Bardenet, Y. Bengio and B. Kégl, "Algorithms for hyper-
parameter optimization", Proc. 24th Int. Conf. Neural Inf. Process. Syst., pp.
2546- 2554, 2011.
[42] D. Yuan, X. Chang, P.Y. Huang, Q. Liu, Z. He, Self-supervised deep correlation
tracking, IEEE Trans. Image Process. 30, pp. 976–985, 2021.
https://doi.org/10.1109/TIP.2020.3037518.
[44] J.V. Fonseca, R.C.L. Oliveira, J.A.P. Abreu, E. Ferreira, M. Machado, Kalman filter
embedded in FPGA to improve tracking performance in ballistic rockets, in:2013
UKSim 15th International Conference on Computer Modelling and Simulation,
2013, pp. 606–610, https://doi.org/10.1109/UKSim.2013.149.
[45] Al-Rababah, A.A. Qadir, Embedded architecture for object tracking using Kalman
filter, J. Comput. Sci. 12(5) pp. 241–245. 2016 doi:10.3844/jcssp.2016.241.245.
[46] W. Liu, H. Chen, L. Ma, Moving object detection and tracking based on Zynq
FPGA and ARM SoC, IET International Radar Conference, pp. 1–4, 2015
https://doi.org/10.1049/cp.2015.1356.
[48] P. Rao, M.A. Bayoumi, An efficient vlsi implementation of real-time Kalman filter,
IEEE International Symposium on Circuits and Systems pp. 2353–2356, 1990.
https://doi.org/10.1109/ISCAS.1990.112482.
[49] L. Bossuet, G. Gogniat, J. Diguet, J. Philippe, A modeling method for
Reconfigurable Architectures, in: System-on-Chip for Real-Time Applications. The
Kluwer International Series in Engineering and Computer Science, vol. 711, 2003.
doi:10.1007/978-1-4615-0351-4_16.
[54] Q. Iqbal et al., Design and fpga implementation of an adaptive video subsampling
algorithm for energy-efficient single object tracking, in: Proc. 2020 IEEE
International Conference on Image Processing (ICIP), 2020, pp. 3065–3069.
[56] W. Liu, D. Anguelov, D. Erhan, C. Szegedy et al., “SSD: Single Shot MultiBox
Detector,” in European Conference on Computer Vision, Cham, Switzerland, pp.
21-37, 2016.
[57] K. He, X. Zhang, S. Ren, J. Sun, “Spatial Pyramid Pooling in Deep Convolutional
Networks for Visual Recognition,” in European Conference on Computer Vision,
Cham, Switzerland, pp. 346-361, 2014.
[58] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich Feature Hierarchies for
Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference
on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
[59] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks," in IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[60] S. Liu, et al., "Path Aggregation Network for Instance Segmentation," 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759-
8768, 2018.
[61] T. -Y. Lin et al., "Feature Pyramid Networks for Object Detection," 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936-944,
2017.
[62] J. Redmon, A. Farhadi, “CNN9000: Better, Faster, Stronger,” In: IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 7263
—7271, 2017.
[63] A. Bochkovskiy, C.Y. Wang, H.Y.M Liao, “CNN: Optimal Speed and Accuracy of
Object Detection,” arXiv:2004.10934, 2020.
[64] Y. Zhou and J. Jiang, “An FPGA-based accelerator implementation for deep
convolutional neural networks. In Proceedings of the 2015 4th International
Conference on Computer Science and Network Technology, ICCSNT 2015,
Harbin, China, vol. 1, pp. 829–832, 2015.
[65] A. Shawahna, S.M Sait and A. El-Maleh, “FPGA Based Accelerators of Deep
Learning Networks for Learning and Classification: A Review. IEEE Access, 7, pp.
7823–7859, 2019.
[66] Wang, E., Davis, J., Zhao, R., Ng, H-C., et al.: Deep neural network approximation
for custom hardware. where we have been, where we are going. ACM Comput.
Surv. 52(2), pp. 1–39, 2019.
[68] Blaiech, A.G., Khalifa, K.-B., Valderrama, CV., et al.: A Survey and Taxonomy of
FPGA-based Deep Learning Accelerators. J. Syst. Architect. 98, 331–345, 2019.
[69] HajiRassouliha, A., Taberner, A.J., Nash, M.P., Nielsen, P.M.F.: Suitability of
recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and
image processing algorithms. Signal Process. Image Comm. 68, 101–119, 2018.
[70] Tong, K., Wu, Y., Zhou, F.: Recent advances in small object detection based on
deep learning: A review. Image Vis. Comput. 97, 103910, 2020.
[72] C. Ding, S. Wang, N. Liu, N., Xu, K., et al., “REQ-CNN: A Resource-Aware, Efficient
Quantization framework for Object Detection on FPGAs,” In: 2019 ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA,
pp. 33–42, 2019.
[76] D.T Nguyen, T.N Nguyen, H. Kim, H-J. Lee, “A High-Throughput and Power-
Efficient FPGA Implementation of CNN CNN for Object Detection,” IEEE Trans.
Very Large Scale Integr. (VLSI) Syst. 27(8), 1861—1873, 2019.
[78] K. Guo, L. Siu, J. Qiu, S. Yao, et al. “Angel-Eye: A Complete Design Flow for
Mapping CNN onto Customized Hardware,” In: IEEE Computer Society Annual
Symposium on VLSI (ISVLSI), Pittsburgh, PA, USA, pp. 24–29, 2016.
[79] G. Wei, Y. Hou, Q. Cui, G. Deng, et al., “CNN Acceleration using FPGA
Architecture,” In: IEEE/CIC International Conference on Communications in China
(ICCC), Beijing, China, pp. 734–735 (2018)
[80] C. Zhang, P. Li, G. Sun, Y. Guan, et al. “Optimizing FPGA-based Accelerator Design
for Deep Convolutional Neural Networks,” In: 2015 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, pp. 161–
170 , 2018.
[81] Cambay, V.Y., Uc¸ar, A., and Arserim, M. A.: Object Detection on FPGAs and
GPUs by Using Accelerated Deep Learning. In: International Artificial Intelligence
and Data Processing Symposium (IDAP), Malatya, Turkey, pp. 1–5, 2019.
[82] D. Pestana, et al. “A Full Featured Configurable Accelerator for Object Detection
With CNN. IEEE Access, 9, pp. 75864–75877, 2021.
[83] N. Zhang, X. Wei, H. Chen and W. Liu, “FPGA Implementation for CNN-Based
Optical Remote Sensing Object Detection,” Electronics, 2021.
[84] R.E. Kalman, A new approach to linear filtering and prediction problems, Trans.
ASME–J. Basic Eng. 82 (1) (1960) 35–45.
[85] Q. Li, R. Li, K. Ji, W. Dai, Kalman filter and its application, in: 2015 8th International
Conference on Intelligent Networks and Intelligent Systems (ICINIS), IEEE, 2015,
pp. 74–77.