Review On Image Processing Fpga Implementation Perspective IJIRCST 2014
Review On Image Processing Fpga Implementation Perspective IJIRCST 2014
Abstract— Digital image processing (DIP) is an ever growing area There are many useful applications of image processing. It is used
with a variety of applications including medicine, video surveillance, as remote sensing for robot guidance, and target recognition. It is
and many more. In order to improve the performance of DIP systems also used for industrial inspection, and in medical technology such
image processing algorithms are implemented in hardware instead of
as X-Ray enhancement. A very useful application of digital image
software. The idea here is mainly to obtain a system faster than
software image processing. Image processing tasks such as filtering, processing is to view the various color intensities present in an
stereo correspondence and feature detection are inherently highly image and split the image into segments based on the results. The
parallelizable. Thus FPGAs (Field Programmable Gate Arrays) can be biggest performance bottleneck is the time involved in processing
a useful approach in the area of Digital Signal Processing. FPGAs the images captured by the camera. Implementing such applications
provide advantage of the parallelism, low cost, and low power on a general purpose computer can be easier, but not very time
consumption. They are semiconductor devices that contain a number
efficient due to additional constraints on memory and other
of logic blocks, which can be programmed to perform anything from
basic digital gate level techniques, to complex image processing peripheral devices. This leads to explore possible hardware based
algorithms. This paper provides an overview of the various works that alternatives. Recently, image processing algorithms implemented
demonstrate the benefits of using FPGAs to implement image in hardware have emerged as the most viable solution for
processing algorithms like median filter, morphological, convolution, improving the performance of image processing systems. The
smoothing operation and edge detection, etc. Gray-level images are introduction of reconfigurable devices and system level hardware
very common in image processing. These types of images use eight
programming languages has further accelerated the design of
bits to code each pixel value, which results in 256 different possible
shades of grey, ranging from 0 (black value) to 255 (white value). DIP in hardware. FPGAs are often used as implementation
Latest generations FPGAs compute more than 160 billion platforms for real-time image processing applications. A Field
multiplication and accumulation (MAC) operations per second. Programmable Gate Array (FPGA) is a programmable (or
reconfigurable) device [1] in which the final logic structure can
Index Terms— FPGA, Digital Image Processing (DIP), algorithms be directly configured by the end user. An FPGA consists of an
array of uncommitted elements that can be programmed or
I. INTRODUCTION interconnected (or configured) according to a user’s specification in
Digital image processing is the processing and display of images. a virtually limitless number of ways. Being reprogrammable and
Image processing is used for the modification of the image. There easily upgradable, an FPGA offers a compromise between the
are three main categories of image processing: Image enhancement, flexibility of general-purpose processors and the hardware based
image restoration, and image classification. Image enhancement speed of ASICs. They allow rapid prototyping of a system and offer
provides more effective display of data for visual interpretation. It an inexpensive option to validate system requirements [2]. Placing
helps a user to view the image and recognize different segments of the functionality of image processing applications onto hardware
an image. An example of this is to edit the shades in an image. This allows faster processing as it is no longer necessary to split the
technique is very useful for assisting with distinction of different individual instructions into fetch, decode and apply cycle needed in
objects in an image. Rectification and restoration of an image is the typical processing unit of a computer.
another important aspect of image processing. It deals largely with In this paper a survey implementation of image processing
image correction, which may be necessary due to the image being applications on FPGAs with an emphasis on the salient features of
affected by geometric distortion or noise. It can also remove FPGAs has been presented. The rest of the paper is organized as
blurring whereby a poor quality image may be upgraded to one with follows. Section II highlights the advantages & limitations of
better quality and distinguishable features. Image classification is FPGAs. Section III details algorithm mapping and window based
where images are classified based on colors or shapes present in the operators. Section IV describes the various filtering algorithms like
image. This can be useful in order for a computer to differentiate convolution filtering, median filtering, etc. Section V describes
between different types of images. histogram based algorithms. In section VI, motion based algorithm
have been described. Finally, section VII summarizes prior research
in the FPGA implementation of image processing algorithms.
II. EVALUATION OF FPGAS AS PLATFORM FOR divide by 8 using the bit shifting method of division instead of a
DEVELOPING DIP APPLICATIONS divide by 9).
c) Current FPGAs cannot be reconfigured quickly as the process of
A. Advantages of FPGAs
modifying or combining FPGA circuits is also laborious.
Many advantages of FPGAs make them a preferred choice of d) The size of memory that can be implemented using standard
implementation in DIP realm. Based on the survey, many logic cells on an FPGA is limited, as implementing memory is
significant features have been found which are as follows:- an inefficient use of FPGA resources.
a) A characteristic of many image-processing methods is the e) Routines where complex tasks cannot be broken down into
multiple iterative processing of data sets such as four stages of simpler tasks must perform a more serial method of
canny edge detector, which require performing multiple passes over processing, which is not entirely efficient with FPGAs.
the image. These steps, which have to be performed sequentially on f) Hardware offers much greater speed than a software
a general-purpose computer, can be fused in one pass in FPGA, as implementation, but it comes with a price of increased development
their structure is able to exploit spatial and temporal parallelism. time inherent in creating a hardware design. Most software
FPGA can perform multiple image windows in parallel and multiple designers are familiar with C, but in order to develop a hardware
operations within one window also in parallel. system, one must either learn a hardware design language such as
b) By employing several optimizations techniques such as VHDL or Verilog, or use a software-to-hardware conversion
Loop Fusion, Loop Unrolling etc efficient usage of FPGA scheme, such as Streams-C, which converts C code to VHDL, or
resources and speed-up in implementations is possible by avoiding MATCH, which converts MATLAB code to VHDL
many redundant operations.
c) FPGAs are capable of parallel I/O, which allows them to perform III. ALGORITHM MAPPING
read (from memory), process and write (to memory)
simultaneously. Many operations such as convolutions, finding Algorithms for image processing are normally classified into one of
square root etc can be executed much faster by using pipelining and three levels: low, intermediate or high. Low-level algorithms
parallelism. operate on individual pixels or neighborhoods [5]. Intermediate-
d) All of the logic in an FPGA can be rewired, or reconfigured, with level algorithms either convert pixel data into a different
a different design as often as the designer likes. This type of representation, such as a histogram, coordinate or chain code, or
architecture allows a large variety of logic designs dependent operate on one of these higher representations. High-level
on the processor’s resources), which can be interchanged for a algorithms aim to extract meaning from the image using
new design as soon as the device can be reprogrammed. information from the other levels. This could be in the form of
e) FPGAs provide the flexibility to reprogram and upgrade to new rejecting a component or identifying where an object is within an
standards. Easy Upgradeability ensures that FPGAs solutions image. When moving from low to the high-level representations
evolve quickly with no risk of obsolescence. there is a corresponding decrease in exploitable parallelism due to
f) The reusability and efficiency of hardware implemented on the change from pixel data to more descriptive representations.
FPGA, is especially useful in developing Image Processing IP However there is also a reduction in the amount of data that must be
(intellectual property) as it allows an efficient system in terms of processed, allowing more time to do the processing. Due to their
cost & performance. Possibility of quick integration of the IP blocks structure, FPGAs are most appropriate for use with computationally
without a need of modification or repetition of verification cycle [3] intensive tasks which form the vast majority of low and
simplifies debugging and thus greatly reduces the time-to-market. intermediate-level operations. The large data sets and regular
g) Because of its LUT based architecture, some convolution masks repetitive nature of the operations can be exploited. For this reason
(such as constant coefficient multiplier or KCMs) can be it has been traditional in many systems for the FPGA to handle the
implemented very efficiently [4]. low-level operations and then pass the processed data to a
h) High computational density in FPGA together with a low microprocessor which then executes the high-level operations. With
development costs allows even the lowest volume consumer increasing FPGA size, it is now possible to implement processor
market to bear the development costs of FPGAs. In fact, cores on the reconfigurable fabric, which means the FPGA can form
compared to ASICs, FPGAs are especially useful in a lower volume the core of the system.
type of application. With low-cost FPGAs high definition Low-level image processing operators can be classified as local
solutions can now be implemented for less than US$1.00 per operators (point operators, window Operator) s and global
1,000 logic elements (LEs) operators, with respect to the way the output pixels are computed
A lot of research has been recently done on utilizing FPGAs from the input pixels [6].The local operators depend on data from a
as development platform for DIP algorithms. In this paper the relatively small neighborhood that is local in spatial and temporal
related work in the area has been presented. dimensions. Examples include thresholding, convolution, and
motion estimation. Global operators depend on data from the entire
B. Limitations of FPGAs image. Examples include transforms like the fast Fourier transform
On the other hand, there are also the limitations of FPGAs for and principle component analysis as well as statistical histogram
image processing applications. techniques.
a) There are many overheads in FPGA design. This include data
A. Window-based Image Operators
transfer times which is the time required to upload (or
download) the data, from (or to) reconfigurable processor to(or Window-based operators need only partial or local information
from) host; time for reconfiguration. about the image, that is they are is restricted to a small neighbor of
b) FPGAs are excellent choice only for those algorithms which image data centered on a reference pixel A window-based image
don’t use floating -point mathematics or complex mathematics. operator is performed when a window with an area of w × w pixels
Division, direct multiplication etc are very complex and expensive is extracted from the input image and it is transformed according to
on FPGA. Hence, the designers have to reformulate their a window mask or kernel, and a mathematical function produces an
algorithms & avoid complex mathematics (e.g. implementing a output result [7]. The window mask is the same size as the image
2
International Journal of Innovative Research in Computer Science & Technology (IJIRCST)
ISSN: 2347-5552,
5552, Volume-2,
Volume Issue-1, January 2014
window and their values are constant through the entire image
processing
essing & the function is applied independently at all pixel
locations. The values used in the window mask depend on the
specific type of features to be detected or recognized. Usually a
single output data is produced by each window operation and it is
stored
ed in the corresponding central position of the window as shown
in Fig 1.
3
Review on Image Processing: FPGA Implementation perspective
Low-pass filters use positive weights and are used for image B. Median Filtering
smoothing. High pass filters use a kernel with a positive center Median filtering is a powerful instrument used in image processing.
weight and negative outer weights and are used to enhance high The median filter is a non linear filter which is commonly used to
frequency components in an image such as edges and fine detail. If remove impulsive noise from images, while preserving edges and
the filter size is m x m, then each output pixel depends on m other details. Two common types of impulsive noise are salt and
adjacent pixels, and thus m multiplies and adds are required at pepper noise, and random-valued noise. Impulsive noise replaces
each site. Therefore, high performance can be achieved by the intensities associated to a certain percentage of pixels by the
exploiting parallelism. maximum or minimum possible intensity (salt and pepper noise) or
There are various ways to implement a convolution filter. The by any value between the maximum and minimum intensity
convolution algorithm uses adders, multipliers, and dividers to (random-valued noise). A median filter is effective in removing this
calculate its output. On FPGAs, use of mathematics tends to slow type of noise without affecting the distinguishing characteristics of
down performance. Many designers favor techniques that reduce the signal. A median filter can outperform linear low-pass filters for
the algorithm’s dependency on complex mathematics. Another which also smooth out edges and other details which are present in
obstacle in this algorithm’s design was implementing the capability the original image. A standard median operation is implemented by
to handle negative numbers. In a proper convolution algorithm, the sliding a window of odd size (e.g. 3x3window) over an image [12].
mask can (and often does) consist of negative numbers. Effectively, At each window position the sampled values of signal or image are
the VHDL had to be written to handle these numbers by using sorted, and the median value of the samples replaces the sample in
signed data types. Signed data simply means that a negative number the center of the window as shown in Fig 4.
is interpreted into the 2’s complement of its non -negative dual.
This means that all vectors within the design must use an extra bit
as compared to unsigned numbers. The extra bit always carries the
sign of the number – 0 for a positive number, 1 for a negative
number. Addition and multiplication were instantiated using simple
+ and * signs in the VHDL code. Since a proper convolution
involves a division by the number of pixels in the window, some
thought had to be put into this part of the algorithm’s hardware
implementation. Hardware dividers on FPGAs are quite large and
slow. So, instead of division the bit shifting method is used. Fig 3
shows a graphic representation of the mathematics of the hardware Fig.4: Median Filter.
convolution. Note that a valid output for the convolution algorithm
occurs six clock cycles after the first window is valid. Since this Median filtering is usually based on data sorting algorithms,
design is pipelined and will run in the megahertz range, this kind of including bubble sort, quick sort, and insertion sort. Several
startup latency has very little effect on overall design speed [10], techniques based on these algorithms have been proposed in the
[11]. literature for implementing median filters on hardware in these
sorting schemes, the incoming pixels pass through a network of
comparators and swapping units. The comparators compare two to
three incoming pixels at once, while the swapping unit sorts them
accordingly. It can be mentioned at this point that to find median of
a sequence of size (2N+1) using bubble sort requires N(2N+1)
sorting units and (2N+1) registers. As the window size increases,
the number of compare-and-swap unit’s increases significantly. In
order to do this properly, a counter must be used to tell the output
data-valid signal when to change to its ‘on’ state. Since it is desired
that the output image be the same size as the input image, and use
of the window generator effectively reduces the amount of valid
output data, borders with zero value pixels must be place around the
image. In order to do this properly, the counters are used to tell the
algorithm when the borders start. A VHDL counter was written to
Fig 3: Hardware Design of Convolution count pixel movement as the data streams into the entity. Since
images are two-dimensional data, two counters were needed: one to
The main problem in implementing and computing convolution is count rows and one to count columns in the image. Optimization
speed, area and power which affect the image processing system. techniques may lead to some reduction in the number of these units.
Implementing the algorithm in parallel hardware will speed up the The parallel strategy leads to a significant reduction compared to
process but the implementation itself is very complex and requires a the wave sorter approach. In this strategy, it is necessary to consider
huge silicon area Optimization of the convolution algorithm can be total number of required steps to sort an array that is the steps used
easily achieved if one has limited kernel specifications. For to read data from memory and the steps required to store the sorted
example, if all coefficients in the kernel are powers of two, the data back to memory [13]. With this kind of method, data can be
VHDL synthesizer is able to result in a design that uses fewer stored in the array by sending a datum to the first register and later,
resources. This is due, of course, to the way numbers are when the second datum is sent to the first register, the value on the
represented in digital systems, where a number that is a power of first array is shifted to the second register. The necessary number of
two is represented with only one bit. Further optimization is steps for sorting is equal to the number of characters in the biggest
possible by reducing the bit widths of the kernel constants. This is group of identical characters divided by 2. The parallel sorting
result in a smaller coefficient data range, but this compromise may strategy is shown in Fig 5. Each node is a two element sort, with the
be acceptable in certain cases.
4
International Journal of Innovative Research in Computer Science & Technology (IJIRCST)
ISSN: 2347-5552,
5552, Volume-2,
Volume Issue-1, January 2014
lower input exiting the node on the left, the higher input leaving on
the right.
Erosion Dialation
Table I: Synthesis result for median filter with window size 3×3
using Xilinx device
Image Processing time Utilization of slices
size (m sec) (in percentage)
4×4 0.00246 4(2000 out of 4656)
16×16 0.11766 4(221 out of 4656)
32×32 0.54006 5(237 out of 4656) Fig 6: Input image Output image
64×64 2.306 5(250 out of 4656)
128×128 9.525 5(259 out of 4656) V. HISTOGRAM BASED ALGORITHMS
Image enhancement involves techniques to sharpen the image
C. Morphological algorithm features such as edges, boundaries or contrast to make a graphic
The morphological algorithm refers to a class of algorithms that display more useful for display and analysis. There are various
transforms the geometric structure of an image. Morphology can be spatial domain image enhancement techniques like Median Filter,
used on binary and gray scale images, and is useful in many areas of Contrast Stretching,
tretching, Negative image transformation, Power law
image processing, such as reconstruction, edge detection, detection transformation and Histogram Equalization. Median filter has been
restoration and texture analysis. Morphological operators are already discussed, is a non-linear,
linear, low-pass
low filtering method, which
defined as combinations of basic numerical operations taking place are used mainly to remove salt-and
salt pepper noise from an image.
over an image by using a structuring element. The structuring Contrast
st stretching attempts to improve an image by stretching the
element is a window that scans over an image and modifies it range of intensity values it contains to make full use of possible
according
cording to some specified rule. The most basic morphological values. Contrast stretching is restricted to a linear mapping of input
operations are dilation and erosion. Dilation adds pixels to the to output values. The negative transformation results in reversing of
boundaries of objects in an image, while erosion removes pixels on the grey level intensities of the image thereby producing a negative
object boundaries. The number of pixels added or removed from thet like image. Power law transformation is also called as gamma
objects in an image depends on the size and shape of the structuring correction, which is given by the expressions
expression cr γ . For various
element used to process the image. In the morphological dilation values of γ different levels of enhancements can bbe obtained. For
and erosion operations, the state of any given pixel in the output achieving high performance image enhancement methods are
image is determined by applying a rule to the corresponding
corres pixel implemented on FPGAs.
and its neighbors in the input image.
A. Histogram Equalization
Rules for Dilation and Erosion: Histogram equalization is one of the commonly used image
Erosion- The value of the output pixel is the minimum value of all enhancement tecniques. Histogram equalization is considered to be
the pixels in the input pixel's neighborhood. In a binary image, if most popular because of its simplicity and better performance on all
any of the pixels is set to 0, the output pixel is set to 0 based on
o the types of images. Histogram equalization is a transformation that
logical AND relationship [14]. ]. Erosion can be used to eliminate stretches the contrast by redistributing the gray level values
valu
unwanted white noise pixels from an otherwise
therwise black area. It allows uniformly [15]. Digital images are represented as two dimensional
a white pixel will remain white in the output image is if all of its pixel arrays. Each pixel indicates the brightness or color of the
neighbors are white. image at a given point. Suppose we have an image which is
Dilation- The value of the output pixel is the maximum value of all predominantly dark. Then its histogram would be concentrated
the pixels in the input pixel's neighborhood. It uses the NAND towards the lower end of the grey scale and all the image detail is
rather than the AND logical operation. Being the opposite of compressedd into the dark end of the histogram. If we could ‘stretch
erosion, dilation will allow a black pixel to remain black only if all out’ the grey levels at the dark end to produce a more uniformly
of its neighbors are black. This operator is useful for removing distributed histogram then the image would become much clearer.
isolated black pixels from an image. Binary erosion and dilation Histogram equalization creates an image with equally distributed
masks are: brightness
ightness levels over the whole brightness scale (0-255).
(0 It
maximizes the overall contrast; a nearly uniform (i.e. flat)
5
Review on Image Processing: FPGA Implementation perspective
distribution is produced. The histogram equalization algorithm is The “memory controller unit” generates timing for control signals
implemented on FPGAs which provide finer flexibility and to send/receive image pixels data to/from the memory. The “weight-
powerful computing capability. Fig 7 shows the Block Diagram of updating unit” is in fact an arithmetic processor to calculate the
Histogram Equalization algorithm. weights and threshold values. Each input pixel is read from
memory, compared with the weights and the closer weight is
updated. The update is done based on the difference between the
input pixel and the weight, scaled by a learning rate factor. Once a
complete frame of the image is processed, the center of background
and foreground clusters is computed. “Thresholding unit”
determines the threshold value by averaging the weights. Then
every single pixel of the same image is fetched from memory via
memory controller unit. Each read pixel is compared to the
threshold value, and the result is written back to the memory.
In this algorithm we need real numbers for some parts of the
numerical computation. These non-integers, such as floating point
Fig 7: Block Diagram of Histogram equalization algorithms allow a wide range of values to be represented. But floating-point
arithmetic units consume significantly greater hardware resources
The ROM is initialized by image. Histogram of gray level block than the integer arithmetic and this make it more suitable for million
will counts the occurrence of each pixel value (gray value block) in gates FPGA like Xilinx Virtex series. Of course so many
the image in 1-D array. Cumulative block will count each value in enhancement and optimization has been proposed for the real
the array with previous one, multiplier block will multiply numbers. Since the resource of the current FPGA device is limited
CDF(cumulative distribution function) array with constant value, and because the focus of this algorithm is not on the high precision
Then mapping block will mapping each pixel in unmodified image of the numbers, all numbers are represented in integer and an
to corresponding value in new matrix. The minimum time period in approximation is applied for the arithmetic.
this system is 5ns for a test image of 100×100. It has been found
C. Edge Detection algorithm
that the computation speed improved further by considering
optimization in FPGA implementation systems. An edge in an image is a contour across which the brightness of the
image changes abruptly. However, image data is discrete, so edges
B. Thresholding technique in an image often are defined as the local maxima of the gradient
The histogram–based method [16] can be used to determine the [17].The edges of image are considered to be most important image
threshold value for image binarization. For extracting useful attributes that provide valuable information for human image
information from an image this algorithm divides the pixels into perception. The edge detection refers to algorithms which aim at
two groups --background and foreground objects then the optimum identifying points in a digital image at which the image brightness
threshold is obtained. All pixels above a determined (threshold) changes sharply. The basic edge-detection operator is a matrix area
grey level are assigned to the object, and all pixels below that level gradient operation that determines the level of variance between
are assumed to be outside the object (background).The object pixel different pixels. Examples of gradient-based edge detectors are
is given a value of “1” while a background pixel is given a value of Roberts, Prewitt, and Sobel operators. All the gradient-based
“0” which gives the binary output from thresholding. The selection algorithms have kernel operators that calculate the strength of the
of the threshold level is very important, as it will affect any slope in directions which are orthogonal to each other, commonly
measurements of parameters concerning the object, such as poor vertical and horizontal. Then the different components of the slopes
contrast, inconsistency between sizes of object and background, are combined to give the total value of the edge strength.
non-uniformity in the background, and correlated noise. The These algorithms consist in a 2-D first derivate operator applied to
threshold level is normally taken as the lowest point in the trough the grey-scale image to highlight regions of the image with high
between the two peaks or the mid-point between the two peaks may first spatial derivates. The edges are translated into ridges in the
be chosen. For hardware implementation of the thresholding gradient magnitude of the image. The algorithm tracks along the top
method speed and complexity have to be considered and these of these ridges and sets to zero all pixels that are not actually on the
conditions are met by using FPGAs. The hardware architecture is ridge top and give a thin line in the output. The edge detection of an
based on a weight-based clustering threshold algorithm in which the image is the convolution products of the image pixels with different
gray level pixels of an image are divided into two clusters, masks which result in the calculation of the horizontal and the
foreground and background. The hardware includes three major vertical gradient. The two gradients are calculated using differences
modules; memory controller unit, weight-updating unit, and between adjacent pixels.
thresholding unit. The modules and their interconnections are Prewitt Edge Detection
illustrated in Fig 8. One way to find edges is to use the Prewitt kernels. The Prewitt
kernels are based on the idea of the central difference and give
equal weightage to all pixels when averaging. The vertical and
horizontal kernel for the prewitt algorithm is given in Fig 9.
Gx Gy
Fig 9: Prewitt Edge Detector – Horizontal & Vertical kernel.
Fig 8: Thresholding block diagram
6
International Journal of Innovative Research in Computer Science & Technology (IJIRCST)
ISSN: 2347-5552, Volume-2, Issue-1, January 2014
These convolutions are applied to the grey-scale image to get the VI. MOTION BASED IMAGE ALGORITHMS
horizontal (Gx) and the vertical gradients (Gy). These kernels can
A. Sum of Absolute Differences (SAD) Algoritm
then be combined together to find the absolute magnitude of the
gradient at each point. These kernels are, however, sensitive to The Sum of Absolute Differences (SAD) algorithm is a simple area
noise. The gradient magnitude is given by: base image matching technique for determining a correlation
between two images [19]. It is used in stereo imaging, where two
Gx + Gy ; │G│ │Gx│ + │Gy│
cameras are used to image a scene from two different locations so
│G│ (2)
that a physical point appears in different locations in each camera
Gradient direction is given as,
image. This algorithm compares a window in one image, with every
Ө tan ( Gy/Gx) (3)
possible window in the other image. The relative pixel offset
Sobel Edge Detector
between a window area and its best match (greatest correlation)
The Sobel algorithm provides a differencing as well as noise
gives a value of stereo disparity. This is repeated for every window
smoothing operation in the single kernel. Thus, noise sensitivity of
of the initial image, with a greater disparity indicating that the
first gradient based operations can be avoided by the use of this
object is closer to the cameras. The hardware required to capture
algorithm. The Sobel operator only considers the two orientations
stereo images can be implemented inexpensively and with greater
which are 0 and 90 degrees. The vertical and horizontal kernels for
computational performance by using FPGAs. This algorithm can be
the Sobel algorithm are given in Fig 10.
formulated as a window-based operator, though some aspects must
be considered:
a) The coefficients of the window mask are variable and new
windows are extracted from the first image to constitute the
reference block. Once the processing in the search area has
been completed, the window mask must be replaced with a new
Gx Gy one, and the processing goes on the same way until all data is
Fig 10: Sobel Edge Detector – Horizontal & Vertical kernel. processed.
b) The different windows to be correlated are extracted in a
With Sobel operator the lines corresponding to edges becomes column-based order from the search area to exploit data
thickens compared with the Roberts Cross and Prewitt operator overlapping and sharing. The pixels are broadcasted to all the
output due to the increased smoothing of the Sobel operator [18]. processors to work concurrently.
The FPGA allows implementation of these algorithms with parallel When the SAD value is processed, data is available in a row format
architecture. The hardware architecture is shown in the Fig 11. therefore when blocks are processed vertically; previous read data
in
the search area are overlapped for two block search SAD algorithm
is shown in Fig 12.
The pixel values of the left and right stereo image are subtracted,
and the absolute value of these differences is taken. This absolute
value is then summed along each three pixel column and row. When
these operations are completed, each PE contains the 3x3 SAD
values for that disparity. The disparity is then increased (shift one of
the images across by one pixel), and the operations repeated. The
disparity that gave the lowest SAD will become that pixel’s value in
the final disparity map. The generated disparity map then gives an
indication of the relative distance of each image pixel from the
cameras.
Fig 11: Hardware Architecture of edge detection algorithm
The architecture consists of four major blocks. First the input pixels
are passed through the Image delay line. Image delay line shifts the
incoming image pixels through line buffers to create a delay line.
Buffer depth depends on the number of pixels in each line. The a) b)
number of buffer lines depends on the size of the convolution
kernel. The pixels are forwarded to the vertical and horizontal
kernel. The kernel performs convolution operation with the pixels
and the result of which is forwarded to the combining block .This
block combines the results of horizontal and vertical convolution.
The final output of this block is the sum of the absolute values of c) d)
the results of horizontal and vertical convolution. The output of the Fig 12: SAD Algorithm a) Left and Right Image Differences
combining block is the output may consist of some spurious noise. b) Absolute Value of Left and Right Image Differences
By controlling the threshold value in the thresholding block the c) Column SAD d) Sum of Absolute Differences for a Single
effect of noise can be reduced. Finally the output pixels are taken Disparity Level
from the Thresholding block.
7
Review on Image Processing: FPGA Implementation perspective
8
International Journal of Innovative Research in Computer Science & Technology (IJIRCST)
ISSN: 2347-5552, Volume-2, Issue-1, January 2014