0% found this document useful (0 votes)

55 views

Image Convolution On FPGAs The Implementation of A multi-FPGA FIFO Structure

This document discusses the implementation of a real-time image convolver using field programmable gate arrays (FPGAs). It proposes using a multi-FPGA structure with an external memory to implement a first-in first-out (FIFO) buffer that stores incoming image pixels. Pixels are distributed across multiple FPGAs to perform parallel convolution computations and increase the size of the convolution kernel. The FIFO buffer stores pixels in the external memory and in flip-flops on the FPGAs. For each new incoming pixel, one write and K-1 reads are needed to feed the pixel values to the FPGA convolver.

Uploaded by

Ngô Minh Khánh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Image Convolution On FPGAs The Implementation of A multi-FPGA FIFO Structure

Uploaded by

Ngô Minh Khánh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Image Convolution on FPGAs:

the Implementation of a Multi-FPGA FIFO Structure

Arrigo Benedetti**, Andrea Prati, Nello Scarabottolo

*
Dept. of Engineering Sciences, Universita di Modena, via Campi, 2 13, I41 100 Modena, Italy
phone +39 59 376739, fax +39 59 376799, e-mail {benedett, prati, scarabot] @dsi.unimo.it
' presently at Computational Vision Laboratory of California Institute of Technology, Pasadena

Abstract
In this paper, we present an implementation of a
real-time convolver, based on Field Programmable Gate
Arrays (FPGA's) to perform the convolution operations.
Main characteristics of the proposed approach are the
usage of external memory to implement a FIFO buffer
where incoming pixels are stored and the partitioning of
the convolution matrix among several FPGA's, in order to
allow data-parallel computation and to increase the size of Figure 1. Convolution of a raster-scan image
the convolution kernel. (K=R=3, AkW5)

1. Introduction processing latency.

In t h s perspective, a software based solution, where
The work presented in this work is one of the result of the overall image is stored in RAM memory before
a research activity aimed at implementing dedicated starting FPGA computation, seems inadequate for the
architectures for image processing. In particular, the added latency and the necessity of a large number of
research activity focuses on reconfigurable devices like memory accesses, possibly exceeding the real-time
Field Programmable Gate Arrays (FPGA's) [l] as the constraints.
most promising ones in applications where processing A far better approach consists in the implementation of
speed - typical of dedicated solutions - has to be matched a pipeline processor, where pixels enter the pipeline and
with low-cost, flexible systems capable of performing produce results as soon as the convolution window has
several different tasks. been filled. Referring again to Fig. 1 , this means that we
Among the various algorithms frequently encountered need to store (K-l)xN+R pixels and a processing latency
in image processing, we put our attention to 2-D - relative to the central pixel of the convolution window -
convolution, where a MxN pixels input image has to be given by
convoluted with a KxR kernel to obtain an output image
where each pixel depends on a KxR window of
neighboring pixels in the input image [2] [3], as shown in is introduced.
Fig. 1 . To accomplish this, it is possible to adopt a set of shift
Even if the problem is already well known and solved registers [4], implementing a FIFO memory able to store
both in software and in dedicated hardware, a satisfactory the (K-l)xN+R pixels above mentioned. For a n bit per
solution for FPGA-based real-time convolution is at our pixel image (i.e., a 2" colors/gray-levels image) this leads
knowledge still lacking. itself to n shift registers as shown in Fig. 2, where single
To reach such a real-time behavior, it is in fact flip-flops store pixels belonging to the convolution
necessary to process input image pixels - entering the window (i.e., pixels required to perform the current
convolution system in the usual raster scan order row-by- convolution product) and FIFO blocks store pixels
row - to ensure an equal rate of output pixels (to avoid already grabbed and necessary for subsequent convolution
unlimited buffering capabilities) and to minimize

123
1089-6503/98$10.00 0 1998 IEEE
I.............
SHIFT, -........ .. . .
..................................................................................................... ;
...e d ...... m .. . .. ..... .... . .
...........
a.... a .* * * * * * * * * * *

Figure 2. Memory structure allowin real-time convolution. -

products.
This approach - quite obvious in dedicated devices - is
not well suited to an FPGA implementation. In fact,
commercially available FPGA devices contain a limited
number of memory elements, since they are constituted
by few hundreds of Configurable Logic Blocks (CLB’s),
each provided with one or two flip-flops and a
configurable logic network. Implementing a memory
structure on FPGA’s means using a lot of devices, thus
wasting their (re)configuration properties (e.g., with (4 (b)
rather large FPGA devices like the Xilinx XC401OE, a = pixels already grabbed $$$/A
3x3 convolution matrix applied to a 512x512 image using = convolution matrix
8 bits per pixels needs (512~2+3)~8=1027~8=8216 flip- 0 = central pixel
flops, requiring 11 FPGAs).
In fact, previous works (as [5] and [6]) have presented Figure 3. Subsequent steps in the convolution of a
methods for synthesizing FIFO memories using the raster-scan image (K=R=3, A.k1\1=5)
flip-flops present inside the CLB’s of the Xilinx XC4000
series devices: unfortunately, the number of available
2. The FIFO memory interfaced with a single
CLB’s permits the implementation of FIFO memories far
too small for our needs. For this reason, the solution we FPGA
propose implements the shift registers of Fig. 2 using also
external RAM memory interfaced with the convolution To better understand the requirements of the external
FPGA’s. In particular, flip-flops storing pixel values FIFO memory, it is worth to refer to Fig. 3. I

belonging to the convolution window are stored into In order to minimize the number of external memory
internal CLB’s, while FIFO blocks storing pixels accesses necessary to obtain the required FIFO behavior,
necessary for subsequent convolution operations are we propose an apporach based on the following
stored into the external RAM memory. assumptions:
The implementation of such a FIFO and its 0 each new pixel entering the convolution system (D23
possibilities in terms of speed and kernel size are the main in Fig. 3-b) is routed both to the FPGA convolver and
topics of this paper. to the external FIFO memory, where it is stored in a
In Sec. 2, we discuss the requirements of the FIFO single write cycle;
memory and the solution that we have implemented for 0 the K-1 pixels belonging to the convolution “column”
interfacing it with a single FPGA. of the new pixel (Do3 and D13 in Figure 3-b) must be
In Sec. 3, we extend FIFO capabilities to allow parallel read in K-1 read cycles in order to be routed to the
processing by several FPGAs, thus relaxing constraints on FPGA convolver;
the convolution matrix dimension. 0 the remaining (R-1)xKpixels (Dol, D02, DII, D12, D21
In Sec. 4, an experimental prototype written in VHDL and D2* in Figure 3-b) necessary to perform the
language and implemented in a G800 GigaOps Spectrum convolution are stored inside the FPGA.
board containing 8 Xilinx XC40 1OE devices is discussed. Summarizing, for each new pixel coming from the video
Finally, possible developments are outlined in Sec. 5. signal source via the frame grabber, 1 memory write and
K-1 memory reads must be performed to correctly feed
the FPGA convolver.

124
Extemal RAM

0000 I I
0001 I I
0002 I I
~ ..............

0003 I I
.......................................................................
SHIFT1 SHIFT? SHIFT3

after the first R-1=2 pixels

Extemal RAM

0003
0004
i.......................................................................... L ................ i : i

SHIFT] SHIFT2 SHIFT, 0005 I I

after N-R+ 1=3 more pixels

Extemal RAM

........................................................................
++$-pq-l
:..................................................................... i i .................................................................. i .......................................................................
i i

SHIFT1 SHIFT? SHIFT,

after N=5 more pixels

(4
Extemal RAM

............ ...... ..........

......................................................................... :.............................................. .......................................................................

SHlFTi SHIFT2 SHIFT3

after N-R+1=3 more pixels

Figure 4. Behavior of the external FIFO (WW5,K=R=3)

125
This has been accomplished in the FPGA system by as summarized in Fig. 4, where a 3x3 convolution
implementing the FIFO memory in a circular buffer window and a 5x5 input image are assumed.
stored in RAM, based on the following elements: Before starting to fill the FIFO, R-1 pixels have to be
The inpub‘output clock signal, here called video shifted into the SHIFT1 register (Fig. 4.(a)). The next N-
clock, which synchronizes frame grabber and video R+l pixels are stored in the FIFO denoted FIFO1 in Fig.
encoder operations. During each cycle of clock, a 2, filling it until counter, reaches location 0002 (Fig.
new pixel enters the convolver and a new complete 4.(b)). At this time counterrdll, is enabled and starts to
convolution takes place. address data from FIFO1 and to shift them into SHIFT2.
A convolution clock, whose frequency fo,, is K After N additional pixels (i.e., a full image row has been
times higher than the frequency fplay of the play read) counter,.d(o) is enabled in turn and begins to store
clock. During each cycle of the convolution clock, a data in SHIFT3 The situation is shown in Fig. 4.(c). After
memory access (either read or write) to the external N-R+l additional pixels, the first convolution window is
FIFO is performed. available and the FPGA starts the computation using
A write pointer counter,,., stored inside the FPGA counter,.d(o)and counter,d(l) to fetch DO2and DI2from the
convolver is used to address the first free location FIFO (Fig. 4.(d)).
inside the external FIFO, where the incoming pixel It should be noticed that this approach poses the
has to be stored. following limits on the dimension of the convolution
K- 1 read pointers counter,do), also implemented matrix:
inside the FPGA are used to address the locations of 1) a first limit derives from the maximum frequency
the pixels to be read from the external FIFO to allowed for the convolution clock, used to access the
complete the computation of the convolution external RAM memory to fetch the next “column” of
window. pixels. This limit affects then the maximum number
Using the above elements, the external FIFO, requiring a of rows (thus the value of K ) in the convolution
memory able of accommodating at least matrix: referring to the 8.33 MHz of the video clock
(K-2)xN+(N-R+ 1)+1 =(K-l)xN-R+2 pixels, is obtained (adopted by the european PAL video standard) and

I Input I

FPGA2 0
0
0 0
4- ..............................................................................................................................
0 0

:..................................................................................................................................................
........................................ i:

FPGA,.I f p
i
;
E
2.
f 3s
FPGA,.I f:............................................................
FPGAn
Figure 5. Multi-FPGA architectural schema

126
using 20 nsec access time RAM (i.e.,50 MHz can be obtained as follows.
maximum Lon”), this results in a convolution matrix Referring to Fig. 5, we see that the original K rows
of at most 6 rows; must be partiotioned into p stripes of R columns: one
a second limit derives from the capacity of the FPGA stripe of k2 rows and p - 1 stripes of kI rows. For previous
convolver, that, even in case of particularly favorable considerations, kz=kl+2. Obviously, to correctly partition
convolution matrices (containing only Os, Is and the original K rows, klx(n-l)+k2=K must hold. From this
power-of-2 coefficients, thus requiring only logical equation we can show that
operations and shifts - e.g. Sobel, Prewitt and Laplace K-2 K-2
matrices [2] - which can be implemented using few kl=, , kz=-+2. (2)
CLBs), has to store inside the internal flip-flops the Delay units in Fig. 5 allow each FPGA to obtain its
KxR pixels to be handled. This limit affects then the own stripe of data.
overall dimension of the convolution matrix. Besides the advantages already discussed, this
It is easier shown that the most critical parameter in terms approach allows to optimally map the convolution matrix
of capability of handling large convolution kernels is the into n FPGAs provided that:
number K of rows; moreover, it can be noticed that, for a (K-2) mod n = 0
given “area” of the convolution matrix, it is more without any other constraint upon R.
convenient to minimize K and maximize R in order to To estimate the effectiveness of such approach, let us
relax clock constraints. revise the practical example made at the end of previous
On the basis of the above considerations, we have section. For a 20 nsec access time RAM to be used in a
devised a multi-FPGA structure that is presented in the system working with the european PAL video standard,
next section. we have:
&lay = 8.33 MHz
3. Multi-FPGA solution 0 maximumfconv= 50 MHz
When several FPGA devices are available, the most
0 maximum number k of accesses to external RAM in a
convenient way of partitioning computation among them
is represented in Fig. 5: n FPGAs are used to process in
parallel n different portions of the convolution window,
video clock cycle:
l&.1=6
maximum K dimension of the convolution window:
and an additional FPGA (the 0-th one in Figure 5) merges (n- l)x(6-2)+6=4n+2
intermediate results performing (usually) simple sum For a system with n=8 FPGAs, using only square kernels,
operations. This would mean to subdivide the KxR this means a 34x34 upper limit to the kernel dimension.
convolution matrix into n sub-matrices of size kxr, where If we remove the choice of square matrices, it is
the following relations hold: possible to furtherly increase R up to the space limit given
by the number of available CLB’s.
A great deal of work exist about partitioning FPGA
4. Prototypal implementation
designs (e.g., [7], [8] and [9]), however, given the
considerations expressed in previous section, the best In this section, we present a VHDL [IO]
choice for the convolution matrix subdivision in our case implementation of the multi-FPGA structure, discussed in
is given by previous section, on a multi-FPGA board designed for
p=n , q=l , rapid prototyping [ 1I].
which means a set of horizontal “stripes”. To this purpose, we first describe the main
It must be noticed however that the second level FIFO characteristics of the prototyping board, with particular
blocks shown in Fig. 5 have to be included to ensure that reference to the ones limiting the degrees of freedom in
the results of each single convolution stripe reach FPG& mapping computation and data paths onto the available
aligned in time. These FIFO’s introduce the need of two FPGA’s. Then, we show our solution and we evaluate the
additional accesses to external RAM: one to write the results.
input and one to read the output. In other words, all the
FPGA’s but the last one must use two cycles of the 4.1 Prototypal board
convolution clock to access the second level FIFO’s,
which implies two less accesses to their first-level FIFO, The prototyping board we used is the GigaOps G800
which contains the pixel values to be multiplied. Spectrum board [ 121 , schematically shown in Fig. 6.
Since the number of allowed accesses to fn-st-level In Fig. 6 it is possible to notice the main blocks of this
FIFO’s is the upper limit to the number of rows in each system.
stripe, and since the last FPGA does not need a second 0 The actual computation is performed by pairs of
level FIFO, the best subdivision of the convolution matrix Xilinx XC4010E FPGA’s, connected in modules

127
Figure 6. Block diagram of the GigaOps G800 board
called XMODs: in Fig. 6, four modules (MODO thru both cases, the VMC FPGA outputs the results of its
MOD3) are shown. These FPGAs are named YPGA processing upon data coming from the XBUS or YBUS.
and XPGA (from the name of the bus they are
connected with). Both these FPGAs have two 4.2 The convolver prototype
memory ports: one connected only to a 2 MBytes
DRAM and one connected both to a 2 MBytes A first characteristic that limits the degrees of freedom
DRAM and to a 128 KBytes SRAM device. XPGA of this system is the kind of access to inpudoutput data. In
and YPGA communicate through a bus switch on the fact, XPGA’s are not connected to the YBUS, therefore
first memory port. This switch works on two virtual they receive data to be processed only through the
busses: a 16-bit data bus and a 10-bit address bus. It YPGA’s. This affects the effectiveness of the
is important to stress that only YPGA’s are connected implementation: in particular, it makes more difficult to
to YBUS, i.e. to the inpudoutput data bus. handle a multi-FPGA architecture that uses also XPGAs
0 A module called SCVIDMOD (S-VIDEO, than an architecture using only YPGAs.
COMPOSITE, VIDEO MODULE), that Another limit is constituted by the busses that
decodesiencodes video signals (PAL or NTSC). This interconnect XPGA ’s with YPGA ’s. These busses are also
module interfaces to YBUS for data input and output. used to access to the 2 MBytes DRAM’S. It is then
0 An input FPGA (here called VLPGA) connected to extremely difficult to manage and to synchronize both
the VESA local bus of the PC hosting the board. The connections withXPGA and with DRAM.
VLPGA is interfaced with the HBUS and the YBUS. It It is easy to note that the above restrictions make very
contains all the registers needed for correct board hard and not effective to use both XPGAs and YPGAs to
operation (e.g., the CLKMODE register, that sets map computations.
frequencies of the clocks distributed on the board). On the contrary, XPGAs become useful to implement
An output FPGA (here called VMc) connected to the second level FIFOs of Figure 5: this approach allows,
SCVIDMOD. This is an additional FPGA, directly in fact, to relax the constraints on the maximum number
interfaced with the video output and the B U S . of accesses defined in the previous section. Thus, we can
0 Three main busses that allow connections among the perform on each YPGA s a convolution with a kxr kernel.
various blocks of the board. These busses are: Referring to the above considerations we decided to
P YBUS, a 32-bit IiO bus connected with VLPGA, implement our prototype using only YPGA s to perform
VMC and the YPGA ’s of the XMOD s . convolution. In particular, since our own device has four
P HBUS, a 16-bit bus used to configure and to load XMOD’s, our multi-FPGA convolver uses the 4 YPGAs to
the FPGA’s. map computation and the 4 XPGA ’s to simply implement
P XBUS, a 64-bit bus normally used as four 16-bit second level FIFO’s.
data busses. Each of these busses is connected Some additional considerations are needed to
only to the XPGAs of the and to VMC. completely understand our prototype. Since every
The main data path of our application is the following: memory access requires first to present the memory
pixels generated by the video decoder are passed through address and to assert control signals (like memory read
the YBUS both to VLPGA and to the YPGA’s of the and memory write) then to deactivate these control signals
XMOD s . These modules process data and pass the results before the next memory access, it is necessary to identify
either to the VMC FPGA through YBUS or to the XPGA ’S two timing events for starting and ending each memory
through the bus switches. In the latter case, the X P G A s access.
can perform a further computation or simply pass the
results to the VMC FPGA through the 64-bit XBUS. In

128
o o ~ ; ~ Rl A r i Extemal RAM2

SHIFT1
................

i........................................................................
SHIFT2
i

SHIFT3
"FI
000 I

0004
0002
0003
0004

after the first R- 1=2 pixels 0007

0008

Extemal RAM2
o o o ~Dol
R~ A4--:countei;,,.
l~ I ................ r \ O O O O
0001 000 1
0002
Do3
0003 0003
0004
> > 0005
L ........................................................................ i i ........................................................................
SHIFT] SHIFT2 SHIFT3 0006 0006
0007 0007
Ooo8 0008
(b) after N-R+ 1=3 more pixels
Extemal RAMI Extemal RAM?
Diu1 0000
Do2 000 1
+! ."counter,,, L,
D14 D12
OO021"'.+
..J.. ........
.................L 0002
0003 Diz 0003
0004
> > > > 0005
.........................................................................
i i ........................................................................
i i .......................................................................
i i
0006
SHIFT1 SHIFT2 SHIFT3
0007 0007
0008 0008

...............c
after N=5 more pixels
Extemal RAM I Extemal RAM2
counter,,~,o)
0000
000 1 000 1
0002
D2n Diz DII Din Dn2 Dni
0003
0004
> ? i
-
> 1
>
: 0005
....................................................................
i i ............................................
: ........... ; ...........................................
: " .............. i

SHIFT1 SHIFT2 SHIFT3 0006 I I 0006

0007

after N-R+I=3 more pixels 0008

(d)
\--,

Figure 7. Behavior of the external FIFO implemented on dual-port memory (K=R=3, MM5)

129
Due to board and memory characteristics, this poses an Larger convolution matrices can be considered only
additional limitation: the maximum-frequency clock using new versions of the G800 board, featuring faster
available (33.33 MHz: four times the PAL video standard DRAMS.
fplay) generates an event (either a rising edge or a falling A more significant evolution of the present architecture
edge) every 15 nsecs. This does not match the 30 nsec is the implementation of a series of convolutions. In fact,
access time ED0 DRAM access time (present on recent the experiences on the 3x3 matrices showed that both
versions of the G800 board), nor the 60 nsec access time YPGA and XPGA are underutilized (25% to 30% of CLBs
DRAM (present on our prototype board), thus implying to not used).
reduce the performance by using only some edges of the It seems then feasible to use both YPGA and XPGA for
clock waveform. computation and FIFO management, thus allowing to
To avoid this problem, strictly related to the type of perform, on the same image, two subsequent
board and memory chips available, we exploited the convolutions. This can be useh1 to have an initial image
second RAM port present on the X210 modules, to filtering (to enhance signal-to-noise ratio) or to perform
perform two memory accesses at a time. Clearly, we must edge detection and computation of the brightness gradient
properly alternate memory writes and reads on the two (as required, for instance, in the Hough transform).
ports. In Fig. 7 an example is reported.
With this approach, the FPGA’s make two memory References
accesses per cycle of video clock to each of the two RAM
banks available, allowing to reach an upper limit of a [l] D. Buell (editor). Splash 2 : “FPGA’s in a Custom
16x16 convolution matrix with 30 nsec access time E D 0 Computing Machine ”. IEEE Computer Society Press,
DRAM and an upper limit of an 8x8 kernel with a 60 nsec 1996
access time DRAM. In fact, in this case, FPGA’s can [2] Virginio Cantoni, Stefan0 Levialdi, “La visione delle
perform only one memory access per cycle of play clock macchine - Tecniche e strumenti per l’elaborazione di
to each of the two banks. immagini e il riconoscimento di forme”, Tecniche Nuove,
Milano, 1990
5. Conclusion and future work [3] Vito Cappellini, “Elaborazione numerica delle
immagini ”,Boringhieri, Torino, 1985
In this paper, we have presented the architecture of a [4] The Programmable Logic Data Book. Xilinx, San
real-time convolver implemented by using FPGA’s. In Jose, CA, 1996
particular we have focused on the constraints that limit [5] Peter Alfke, “Synchronous and Asynchronous FIFO
the maximum matrix dimension, which is directly related Designs”. Xilinx Application Note XAPP 05 1, September
to the effectiveness of convolver. We have described a 1996, Version 2.0
possible solution to relax these constraints, based on data [6] Jazi Eko Istiyanto, “The Formation of Super-cliques
parallelism. in the Behavioural Synthesis of FIFO RAMS”, Tech
In our prototype, we started testing the multi-FPGA Report, Gadjah Mada University
architecture with several 3x3 convolution matrices, e.g., [7] P.Athanas and L. Abbott. “Addressing the
Sobel, Prewitt, Kirsch, Laplace and Cantoni operators [2] Computational Requirements of Image Processing with a
[3]. The limited dimension of these convolution matrices Custom Computing Machine: An Overview”. In
allowed us to perform convolution row-by-row on three Proceedings of the 2”d Workshop on Reconjgurable
different YPGA ’s, without requiring first-level FIFO Architecture, Santa Barbara, CA, April 1995
memories, thus avoiding the two external memory [SI Stephen L. Wasson, “FPGA Design: Early
accesses discussed in the previous section. Implications In Partitioning ”, Integrated System Design
Second level FIFO’s and final calculations are magazine, February 1997
performed by the XPGA’s and the VMC, respectively. [9] S. Hauck, Multi-FPGA Systems PhD thesis,
This choice led to meet the real-time constraints also in Department of Computer Science, University of
OUT G800 prototypal board: in fact, even with a 60 nsec Washmgton, Sep 1995
access time DRAM, the dual port used by the XPGAs [lo] IEEE Standard VHDL Language Reference Manual.
allows to implement second-level FIFOs. IEEE Standards
The next step will be the implementation of an 8x8 [ l l ] FPGA Compiler User Guide v3.5. Synopsys,
convolution matrix, spreading over 4 YPGAs the 8 Mountain View (USA), Sep 1996
convolution rows and using again the XPGAs for second [ 121 Giga Operations Spectrum Documentation. Giga
level FIFO’s. The fact that each YPGA has to process two Operations Corporation, Berkley, CA, 1995
convolution rows implies the implementation of first level
FIFO’s, requiring two external memory accesses per
video clock as in the case of the XPGAs (thus still viable
on our G800 board).

130

RPT Maths Form 5 DLP 2022 - 2023
67% (3)
RPT Maths Form 5 DLP 2022 - 2023
16 pages
(Howard Anton, Chris Rorres) Elementary Linear Alg (11th)
No ratings yet
(Howard Anton, Chris Rorres) Elementary Linear Alg (11th)
1 page
Fast Generation of Custom Floating-Point Spatial Filters On Fpgas
No ratings yet
Fast Generation of Custom Floating-Point Spatial Filters On Fpgas
12 pages
Implementation of Convolution
No ratings yet
Implementation of Convolution
3 pages
Fpga Image Accquigision
No ratings yet
Fpga Image Accquigision
6 pages
A Dynamically Reconfigurable Parallel Pixel Processing System
No ratings yet
A Dynamically Reconfigurable Parallel Pixel Processing System
5 pages
New Architecture For Real-Time Image Computing Using Parallel Processing Based On DSP FPGA
No ratings yet
New Architecture For Real-Time Image Computing Using Parallel Processing Based On DSP FPGA
4 pages
Lab FFT Assignment
No ratings yet
Lab FFT Assignment
9 pages
FPGA Implementation of Digital Camera Controller and Frame Capture Device in VHDL. by MD Shakir Rahmani
No ratings yet
FPGA Implementation of Digital Camera Controller and Frame Capture Device in VHDL. by MD Shakir Rahmani
13 pages
Elec2630 Embedded Systems Theory
No ratings yet
Elec2630 Embedded Systems Theory
14 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Doc1132 PDF
No ratings yet
Doc1132 PDF
9 pages
Asynchronous FIFO Implementation Using FPGA
No ratings yet
Asynchronous FIFO Implementation Using FPGA
3 pages
Efficient Implementation of Scan Register Insertion On Integer Arithmetic Cores For Fpgas
No ratings yet
Efficient Implementation of Scan Register Insertion On Integer Arithmetic Cores For Fpgas
6 pages
A_Real-time_High-speed_Broadband_Frequency_Measurement_System_Based_on_FPGA
No ratings yet
A_Real-time_High-speed_Broadband_Frequency_Measurement_System_Based_on_FPGA
3 pages
A Survey On FPGA Hardware Implementation For Image Processing
No ratings yet
A Survey On FPGA Hardware Implementation For Image Processing
8 pages
Use of Reconfigurable FPGA For Image Processing
No ratings yet
Use of Reconfigurable FPGA For Image Processing
5 pages
Design and Implementation of Pipelined FFT Processor: D.Venkata Kishore, C.Ram Kumar
No ratings yet
Design and Implementation of Pipelined FFT Processor: D.Venkata Kishore, C.Ram Kumar
4 pages
High Speed Data Acquisition System Using Fpslic
No ratings yet
High Speed Data Acquisition System Using Fpslic
4 pages
Fpga Tutorial
No ratings yet
Fpga Tutorial
10 pages
FPGA With Touch Screen
No ratings yet
FPGA With Touch Screen
23 pages
Ams 16th Smoi Martin 10.3
No ratings yet
Ams 16th Smoi Martin 10.3
10 pages
Edge Detection
No ratings yet
Edge Detection
9 pages
Implementation of FFT Processor On FPGA: Shruti Ashok Joshi, Nitesh Guinde
No ratings yet
Implementation of FFT Processor On FPGA: Shruti Ashok Joshi, Nitesh Guinde
5 pages
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
No ratings yet
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
7 pages
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
No ratings yet
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
7 pages
Convolution
No ratings yet
Convolution
6 pages
Design and Implementation of Traffic Lights Controller Using Fpga
No ratings yet
Design and Implementation of Traffic Lights Controller Using Fpga
27 pages
1 - A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
No ratings yet
1 - A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
12 pages
Application SSC06 e
No ratings yet
Application SSC06 e
2 pages
FPGA Vision Using The NI LabVIEW
No ratings yet
FPGA Vision Using The NI LabVIEW
2 pages
On FRFT
No ratings yet
On FRFT
11 pages
FPGA Implementation of Convolution in Verilog
No ratings yet
FPGA Implementation of Convolution in Verilog
12 pages
25 Implementing Machine Vision Systems Using Fpgas: Donald Bailey Massey University, Palmerston North, New Zealand
No ratings yet
25 Implementing Machine Vision Systems Using Fpgas: Donald Bailey Massey University, Palmerston North, New Zealand
34 pages
Thesis Proposal Liuyulin
No ratings yet
Thesis Proposal Liuyulin
2 pages
FPGA Based Traffic Light Controller
No ratings yet
FPGA Based Traffic Light Controller
7 pages
Fulltext01 2
No ratings yet
Fulltext01 2
60 pages
Color Detection Using FPGA Based Smart Camera
No ratings yet
Color Detection Using FPGA Based Smart Camera
19 pages
Bergmann
No ratings yet
Bergmann
35 pages
2020-2
No ratings yet
2020-2
5 pages
Reconfigurable FPGA Implementation of Digital Communication System
No ratings yet
Reconfigurable FPGA Implementation of Digital Communication System
17 pages
A Variable-Size FFT Hardware Accelerator Based On Matrix Transposition
No ratings yet
A Variable-Size FFT Hardware Accelerator Based On Matrix Transposition
4 pages
Image Compression Using High Efficient Video Coding (HEVC) Technique
No ratings yet
Image Compression Using High Efficient Video Coding (HEVC) Technique
3 pages
FPGA Based System Design Suitable For Wireless Health Monitoring Employing Intelligent RF Module
No ratings yet
FPGA Based System Design Suitable For Wireless Health Monitoring Employing Intelligent RF Module
4 pages
Range-Doppler Image Processing in Linear FMCW Radar and FPGA Based Real-Time Implementation
No ratings yet
Range-Doppler Image Processing in Linear FMCW Radar and FPGA Based Real-Time Implementation
5 pages
Image Hardware PDF
No ratings yet
Image Hardware PDF
19 pages
E8627 IranArze
No ratings yet
E8627 IranArze
18 pages
Chapter #10: Finite State Machine Implementation
No ratings yet
Chapter #10: Finite State Machine Implementation
32 pages
FPGA Lecture SERC NISER
No ratings yet
FPGA Lecture SERC NISER
57 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
FPGA Based System Design (ELE 4063) RCS
No ratings yet
FPGA Based System Design (ELE 4063) RCS
4 pages
FPGA-Based Feature Detection
No ratings yet
FPGA-Based Feature Detection
9 pages
Reconfigurable Computing Using FPGA: State of The Art and Potential For Systolic Array Applications
No ratings yet
Reconfigurable Computing Using FPGA: State of The Art and Potential For Systolic Array Applications
2 pages
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
No ratings yet
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
6 pages
The Design and Implementation of FFT Algorithm Based On The Xilinx FPGA IP Core
No ratings yet
The Design and Implementation of FFT Algorithm Based On The Xilinx FPGA IP Core
3 pages
Abstract - Implementing FFT and IFFT On FPGA
No ratings yet
Abstract - Implementing FFT and IFFT On FPGA
8 pages
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
No ratings yet
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
8 pages
FPGACam - Real Time Video Processing
No ratings yet
FPGACam - Real Time Video Processing
16 pages
An FPGA Spectrum Sensing Accelerator For Cognitive Radio: Background
No ratings yet
An FPGA Spectrum Sensing Accelerator For Cognitive Radio: Background
2 pages
VHDL Labs - Foreword: Evaluation
No ratings yet
VHDL Labs - Foreword: Evaluation
12 pages
FPGA-based Direct Conversion Receiver With Continuous Acquisition To A PC
No ratings yet
FPGA-based Direct Conversion Receiver With Continuous Acquisition To A PC
25 pages
Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
No ratings yet
Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
45 pages
Chapter 09
No ratings yet
Chapter 09
134 pages
Section 8 Frequency Response Design
No ratings yet
Section 8 Frequency Response Design
60 pages
Section 7 Frequency Response Analysis
No ratings yet
Section 7 Frequency Response Analysis
113 pages
Section 6 Root Locus Design
No ratings yet
Section 6 Root Locus Design
98 pages
A Copyright Protection Scheme For Digital Images Based On Shffled Singular Value Decomposition and Visual Cryptography
No ratings yet
A Copyright Protection Scheme For Digital Images Based On Shffled Singular Value Decomposition and Visual Cryptography
22 pages
Matlab Practical File: Sri Sai Institute of Engg. and Technology
No ratings yet
Matlab Practical File: Sri Sai Institute of Engg. and Technology
26 pages
(Tabatabaian, Mehrzad) COMSOL5 For Engineers (B-Ok - Xyz)
100% (2)
(Tabatabaian, Mehrzad) COMSOL5 For Engineers (B-Ok - Xyz)
335 pages
Application of Texture Attribute Analysis To 3D Seismic Data
No ratings yet
Application of Texture Attribute Analysis To 3D Seismic Data
5 pages
Adaptive Nulling Communications: Antenna For Satellite
No ratings yet
Adaptive Nulling Communications: Antenna For Satellite
22 pages
Kalasalingam Academy of Research and Education Office of The Controller of Examinations
No ratings yet
Kalasalingam Academy of Research and Education Office of The Controller of Examinations
50 pages
Identification and Control of Mechanical Systems 1st Edition Jer-Nan Juang - The ebook is available for quick download, easy access to content
100% (2)
Identification and Control of Mechanical Systems 1st Edition Jer-Nan Juang - The ebook is available for quick download, easy access to content
47 pages
MIMO For Dummies
No ratings yet
MIMO For Dummies
7 pages
Assessing Lateral Period of Building Frames Incorporating Soil-Flexibility by K.bhattacharya & S.C.dutta
No ratings yet
Assessing Lateral Period of Building Frames Incorporating Soil-Flexibility by K.bhattacharya & S.C.dutta
27 pages
Download ebooks file Kinematics and Dynamics of Mechanical Systems Implementation in MATLAB and SimMechanics 1st Edition Kevin Russell all chapters
100% (10)
Download ebooks file Kinematics and Dynamics of Mechanical Systems Implementation in MATLAB and SimMechanics 1st Edition Kevin Russell all chapters
40 pages
2018 - Stability Analysis and Stabilization Methods of DC Microgrid With Multiple Parallel-Connected DC-DC Converters Loaded by CPLs
No ratings yet
2018 - Stability Analysis and Stabilization Methods of DC Microgrid With Multiple Parallel-Connected DC-DC Converters Loaded by CPLs
11 pages
Nozzlepro
No ratings yet
Nozzlepro
20 pages
Pages From Engineering Mathematics John BIRD
No ratings yet
Pages From Engineering Mathematics John BIRD
7 pages
Maths s4 Draft
No ratings yet
Maths s4 Draft
237 pages
Hilbert Space Problems
100% (2)
Hilbert Space Problems
77 pages
Determinants One Shot - Vmath
No ratings yet
Determinants One Shot - Vmath
74 pages
The Causes Mechanical Aspects of Deformation
No ratings yet
The Causes Mechanical Aspects of Deformation
35 pages
Functoriality, Singular Homology, Relative Homology: Induced Maps
No ratings yet
Functoriality, Singular Homology, Relative Homology: Induced Maps
10 pages
Matrix Analysis of Plane and Space Frames - Devdas Menon
100% (1)
Matrix Analysis of Plane and Space Frames - Devdas Menon
196 pages
Matrix-Fracture Transfer Shape Factors For Dual-Porosity Simulators (Lim & Aziz)
No ratings yet
Matrix-Fracture Transfer Shape Factors For Dual-Porosity Simulators (Lim & Aziz)
10 pages
Thurber Eberhart Phillips
No ratings yet
Thurber Eberhart Phillips
10 pages
2 PDF
No ratings yet
2 PDF
27 pages
Transition Representation
No ratings yet
Transition Representation
12 pages
Gaussian Elimination: Example 1: Solve This System
No ratings yet
Gaussian Elimination: Example 1: Solve This System
41 pages
Origin C Programming Guide E
No ratings yet
Origin C Programming Guide E
266 pages
Skills Matrix
No ratings yet
Skills Matrix
24 pages
Mathematics Syllabus
No ratings yet
Mathematics Syllabus
31 pages
Download Full Elementary linear algebra : Metric version Eighth Edition Ron Larson PDF All Chapters
100% (1)
Download Full Elementary linear algebra : Metric version Eighth Edition Ron Larson PDF All Chapters
65 pages

Image Convolution On FPGAs The Implementation of A multi-FPGA FIFO Structure

Uploaded by

Image Convolution On FPGAs The Implementation of A multi-FPGA FIFO Structure

Uploaded by

Image Convolution on FPGAs:

the Implementation of a Multi-FPGA FIFO Structure

Arrigo Benedetti**, Andrea Prati*, Nello Scarabottolo*

1. Introduction processing latency.

Figure 2. Memory structure allowin real-time convolution. -

after the first R-1=2 pixels

SHIFT] SHIFT2 SHIFT, 0005 I I

after N-R+ 1=3 more pixels

SHIFT1 SHIFT? SHIFT,

after N=5 more pixels

............ ...... ..........

......................................................................... :.............................................. .......................................................................

after N-R+1=3 more pixels

Figure 4. Behavior of the external FIFO (WW5,K=R=3)

after the first R- 1=2 pixels 0007

SHIFT1 SHIFT2 SHIFT3 0006 I I 0006

after N-R+I=3 more pixels 0008

You might also like

Arrigo Benedetti**, Andrea Prati, Nello Scarabottolo