0% found this document useful (0 votes)
41 views

High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper

Uploaded by

vinod karuvat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper

Uploaded by

vinod karuvat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

White Paper

Industrial Cameras, Medical Endoscopy, and


Surveillance or Security Cameras
Agilex™ 5 FPGA

High-performance Image Signal


Processing and Camera Sensor
Pipeline Design on FPGAs
Abstract
Image processing is ubiquitous in many embedded applications. It plays a central
role in turnkey systems used for medical imaging, machine vision and robotics,
Authors video conferencing, surveillance and security, automotive solutions, and many
other industrial applications. While ASICs have traditionally underpinned high-
Alexey Lopich scale/low-power imaging products such as smartphones, FPGAs have
Director of Engineering conventionally been enablers for many embedded domains thanks to their inherent
scalability and parallelism. They support constantly evolving image sensor
Nerhun Yildiz interfaces and formats (RGBIr, RGBW, etc.), a broad spectrum of connectivity
Computer Vision Engineer standards, flexible multi-sensor architecture, and high-performance image
processing and compression. Combined with a rapid development cycle, FPGAs
Bill Dries are ideally suited to continuously evolving embedded imaging applications. This
paper presents an FPGA design methodology for developing individual IPs and
Computer Vision Team Lead
camera pipeline systems suitable for high-performance (up to 4.8 GPix/sec) and
Kyle Griffiths power-efficient embedded applications. The framework covers advanced image
signal processor (ISP) functionality such as deterministic ‘glass-to-glass’ latency,
Computer Vision Engineer
geometric distortion correction, HDR, AI and video analytics integration, image
Blaise Robson compression, sensor interfacing, and video over IP connectivity. We outline
complete hardware and software configurable architecture, which enables complex,
Computer Vision Engineering Intern high-performance, real-life camera systems with high device utilization ratios and
arbitrary connectivity.
Embedded Acceleration Division
Altera Corporation
Marlow, UK
Introduction
Over the years, significant advancements in imaging technology have made high-
resolution imaging ubiquitous for many embedded applications. Continuous
demand to increase resolutions, framerate, dynamic range, and content analysis
continues to drive technological advances in all parts of video content handling –
from capture to processing, especially AI, from transmission to display and storage.
FPGAs have always underpinned equipment involved in all stages of image
processing, such as high-end cameras, ProAV and broadcasting equipment, video
Table of Contents capture cards, transcoding, and playback. There are many reasons why using FPGAs
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 is advantageous, especially where off-the-shelf ASIC-based products don’t exist.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Inherent parallelism enables implementation of all stages of the image processing
Optical System Selection. . . . . . . . . . . . . . . . . 2 pipeline to be executed in parallel, including multi-sensor and AI-infused designs,
FPGA System Components . . . . . . . . . . . . . . 2 enabling significant savings on memory and bandwidth requirements compared
to GPU implementations. Reduced dependency on external RAM also supports
Software Stack. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
deterministic low-latency implementation, which is crucial for time-sensitive
AI Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
applications such as medical imaging, industrial robotics, automotive, and
Additional Options. . . . . . . . . . . . . . . . . . . . . . . . 7 surveillance. Furthermore, such parallelism and reduced memory requirements
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 allow to optimize algorithms and designs for performance, power efficiency, and
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 resource utilization [1], resulting in higher performance-per-watt efficiency than
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

CPU and GPU implementations. Arguably, one of the most of implementation. However, it depends on factors such as
significant advantages of FPGA s is their intrinsic throughput and electrical IO compatibility and the availability
customization and IO flexibility. It is especially critical for of Phy and relevant protocol IPs and corresponding drivers.
image processing applications, where sensors, interfaces, Designers also need to consider the control interface (for
algorithms, and standards change over time, allowing for example, I2C or SPI) for the sensor to program and read back
future-proof solutions. High-speed interfaces like Ethernet the sensor’s run time parameters. Some common interfaces
and PCIe enable efficient data transfer between FPGA-based include MIPI CSI-2, SLVS-EC, LVDS, parallel interfaces
image acquisition solutions and external systems. (CMOS), and serial interfaces (SPI or I2C). The interface
In previous work, we have outlined a general methodology choice depends on data rate, signal integrity, noise immunity,
for designing video processing IPs with high fMAX (above cost and ease of integration, and compatibility between the
600 MHz) for high throughput applications [1]. Such image sensor and the receiving device. The throughput of
methodology is equally applicable to imaging solutions. the sensor also limits the choices of video interfaces; for
However, designing an FPGA-based camera solution requires example, an uncompressed 20-bit linear 8K video stream at
considering many factors besides core processing 120 frames per second requires 79.6 gigapixels per second
performance. The rest of the content is organized as follows: bandwidth, which typically results in a highly-parallel LVDS
Section II provides an overview of optical module selection interface.
options, Section III describes overall FPGA camera solution
architecture covering ingest, processing, and output options, FPGA System Components
Section IV covers the necessary ISP software stack, Section
V outlines the process of incorporating AI inference flow into A block diagram of a presented camera solution is provided
a camera solution, Section VI discusses additional options in Fig 1. At a high level, the main components of an imaging
and Section VII provides conclusions. system include the input interface and front-end, image signal
processor (ISP) pipeline, AI engine, Hard Processor
Subsystem (HPS) subsystem, application ISP software stack,
Optical System Selection memory subsystem, and output or data offload interfaces.
In a generic case, the ISP system is a hardware-software co-
Image Sensor design whose hardware components can be designed in RTL,
HLS, or picked from a pre-existing library from an FPGA
Creating a camera system starts with selecting imaging
vendor. In the presented work, we utilize components from
sensors and optics. The optics’ field of view and focal length
Altera’s Video and Vision Processing (VVP) Suite [5] and
must match the desired application requirements and be
other Altera® Quartus Prime Pro IPs for video connectivity,
compatible with the imaging sensor.
memory management, and overall control. The HPS is used
Optical zoom and focus requirements also dictate this to run the software stack that controls the hardware and
selection process, which extends to sensor selection where implements feedback mechanisms like Auto-Exposure (AE)
auto-focus features like Phase Detection Auto-Focus and Auto-White-Balance (AWB), as well as providing a
(PDAF), dual-pixel, or quad-pixel technologies must be Graphical User Interface (GUI) for overall system control.
considered.
The sensors usually have a certain Color Filter Array (CFA)
pattern, where each pixel captures only a specific color.
Typical CFA patterns include commercial Bayer/RGGB,
industrial/automotive RCCC or RCCB, industrial/security
RGBIr, less commonly used RGBW, and many more, where
R, G, B, C, Ir, and W stand for red, green, blue, clear, infrared
and white pixels, respectively. The CFA pattern might also
have different base pixel periods like 2x2 or 4x4, which may
have further flavors like quad high-dynamic range (HDR) [2],
2x1 or 2x2 On-Chip Lens (OCL) [3], etc. Some applications
benefit from monochrome sensors like grayscale, depth
sensors, and near-infrared or short-wavelength infrared [4]
for robotics, defense, and industrial applications.
Another set of considerations is HDR requirements, where
many sensor technologies exist for merging multiple Figure 1. 4K Camera Solution on Agilex 5 FPGA
exposures into a single frame to extend the sensors’ usual
10-12 bits linear Analog to Digital Converter (ADC) outputs Input Connectivity IPs
to 16-24 bits.
The detailed block diagram of the camera solution design is
Sensor Interfaces presented in Fig 2. The primary sensor input interface of the
ISP is MIPI CSI-2 [6]. MIPI CSI-2 offers a high-speed serial
The interface for connecting the image sensor to the FPGA interface supporting high data rates (up to 4.5 Gbps per lane).
is crucial. Such a consideration is essential from the support It is a standardized interface with wide adoption across
perspective – can the sensor be connected directly to the various industries and, crucially, with a wide variety of sensor
FPGA, or will additional bridging devices be required? Direct modules available for prototyping. The Agilex™ 5 E-Series
connectivity has the advantage of lower design cost and ease FPGAs and D-Series FPGAs natively support MIPI interfaces.

2
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

Figure 2. Block diagram of the presented 4K camera solution

Other camera interface options, such as USB, GigE, or Adaptive Noise Reduction (ANR) is a general noise reduction
CoaXPress [7], can be enabled via a video switch for function that reduces the graininess of the video by removing
demonstration and debugging purposes. In the presented various types of temporal sensor noise (as opposed to Fixed
work, we extended input options to a video frame reader, Test Pattern Noise (FPN)). Many factors contribute to temporal
Pattern Generator (TPG), and HDMI Rx. The HDMI input is noise: photon shot noise, pixel dark current, readout noise,
converted to an RGB format. The TPG and HDMI video amplifiers, and quantization noise. We limit ANR to spatial
streams are multiplexed via a video switch and passed noise reduction techniques in the presented work, leaving
through a simple Remosaic IP that converts the RGB signal temporal analysis for future upgrades.
to a Bayer format by dropping two color channels of every After DPC and ANR, the noise level of the video image is
pixel as a CFA pattern matching that of the sensor. A second lowered, and the signal-to-noise (SNR) ratio is improved. The
switch IP multiplexes the synthetic Bayer video with video subsequent functions of the pipeline are designated for color
streams taken from the sensor over the MIPI interface and enhancement and signal improvement. These operations
external memory via a video frame reader. are susceptible to non-linearities like an artificial black-level
Image Signal Processing Pipeline pedestal on the video signal. Therefore, removing the black
pedestal value at this stage is necessary. Based on BLS
In this section, we describe the main functional blocks of an statistics, the Black Level Correction (BLC) block removes
ISP pipeline. This work presents a generic pipeline suitable the black level pedestal value from the pixel values and scales
for a wide range of applications. However, the final the signal back to the full dynamic range.
implementation of the imaging solution and sequence of
processing blocks depend on many factors, such as sensor The Vignette Correction (VC) function aims to address
specification, application requirements, performance vs. deficiencies of the optical module. The optical system
size/power constraints, and many more. We review such generally accumulates less light at the video frame’s corners
additional considerations in Section VI. than at the center. This introduces various vignetting effects,
strengthening as the field of view becomes wide. The primary
Black Level Statistics (BLS) extracts the black-level purpose of the VC is to use a mesh of coefficients to correct
information of the sensor from a region of interest these vignetting effects, with a secondary application to
programmed for an unilluminated section of the sensor. This correct certain classes of fixed-pattern noise that the sensor
block provides information for monitoring the stability of the might introduce.
black level offset/pedestal value set on the sensor. This offset
value leaves a small portion of the dynamic range as foot room White Balance Statistics (WBS) block divides an image in
for the negative pixel values to accommodate for noise for the predefined region of interest (ROI) into zones and groups
the dark pixel values. Saturating those pixel values to zero at each 2x2 Bayer pixel in the zone into macro pixels. The R/G
this stage causes imbalanced noise distribution that and B/G ratios are calculated for each macro pixel and
manifests as imaging artifacts. Subsequently, a Clipper accumulated across each zone. A configurable pair of
function is used to crop the image to the desired size by thresholds is used to reject ratios that are too small or too
removing unilluminated pixels and excess pixels used by large since those values do not represent a white area. The
sensor manufacturers to make the frame wide or taller. software uses those statistics to predict the light source’s
color temperature. Various CFA dies used in sensor
The Defect Pixel Correction (DPC) function removes impulse manufacturing have different light permissibility distributions
pixel noise while preserving details by analyzing the input that cause imbalanced intensity values across different color
pixels against their neighboring pixels of the same color and channels. The primary use of a White Balance Correction
replacing outliers.
3
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

(WBC) function is to correct for this intensity imbalance coefficients. Like white balance coefficients, the CCM
across color channels. However, this imbalance also depends coefficients depend on the sensor characteristics and the
on the light source’s color temperature. The automatic white light source’s color temperature.
balance (AWB) software applies the appropriate WBC Most optical systems introduce geometric distortion on the
coefficients from the color temperature prediction it made image by rounding the corners of the frame, which is more
using WBS. prominent with lenses of large field of view, such as fish-eye.
The Demosaic function is the last block in the pipeline that To perform geometric distortion correction (GDC), we use
operates in the Bayer domain. It converts the CFA video Warp IP (Fig. 3) and application software that maps a
stream into an RGB video stream by generating two of the correction mesh onto the IP to obtain a rectilinear image
missing color channels of each pixel via interpolation. The where straight lines remain straight. Another benefit of Warp
rest of the pipeline operates on RGB data. IP is that it acts as a video frame buffer and scaler. It isolates
The Histogram Statistics (HS) block calculates two input and output video streams, eliminating the need to
histograms in luma domain, one across the whole frame and synchronize them and running design in the free-running
one across a programmable ROI. The auto-exposure (AE) mode.
software uses these statistics to calculate the sensor’s High Dynamic Range (HDR) Considerations
exposure time, analog gain, and digital gain values.
The human visual system has adaptability to a very high-
The Unsharp Mask (USM) Filter enhances the image’s dynamic range, but at any given time, we may only perceive
sharpness by boosting or softening the image’s high- a limited portion of the dynamic range. One way of adjusting
frequency components, thus increasing the perceived details the image’s dynamic range to human vision is tone mapping,
of the scene. where the contrasts of darker and lighter areas are enhanced.
The CFA filters used to create individual colors on the sensor It can be achieved using specialized IPs to perform adaptive,
are sensitive to a spectrum of wavelengths of light, which content-aware, real-time local tone mapping operator (TMO)
does not align perfectly with human vision’s red, green, and (Fig. 4).
blue perception. Another factor contributing to such To perform advanced color space conversion, the designer
misalignment is that each registered color sample is affected must consider using a 3D look-up table (3D LUT). While CSC
by the light source of the other two. The problem can be IP can transform color spaces linearly, it is insufficient for
corrected by multiplying the [R, G, B] vector with a 3x3 Color applications like generating coverage of BT.2020 color space
Correction Matrix (CCM). A traditional video processing for HDR applications or constraining the wide color space to
Color Space Converter (CSC) IP is used to apply the CCM a more limited display like sRGB. Those non-linear operations
require advanced color manipulation capabilities of 3D LUT
to prevent false colors [8].
The final block in the core ISP pipeline is 1D LUT, whose
primary purpose is applying an Opto-Electrical Transfer
Function (OETF) suitable for the video sink used. Applying
the correct OETF is essential for generating correct light
intensities across a variety of displays supporting different
Electro-Optical Transfer Function (EOTF) from legacy
gamma curves for SDR to Perceptual Quantization (PQ) and
Hybrid Log-Gamma (HLG) used for HDR. The secondary
purpose of the 1D-LUT IP is to apply an artistic effect on the
output image, like a ‘toe and knee’ curve, to gracefully roll off
the highlights and shadows, which acts as a global tone
mapping operation complementing TMO.
Figure 3. Geometric Distortion Correction implemented
by Warp IP Output Connectivity IPs and Other Functions
The intermediate output of the ISP after the 3D LUT is
connected to a video frame writer used as a frame grabber
to write frames to the external DDR memory.
A Mixer overlays the output of a custom icon generator onto
the ISP output and adds a letterboxing pattern or color
generated by the TPG.
The output of the 1D LUT is converted from lite to full
streaming video [9] and passed to an HDMI or DisplayPort
(DP) Tx connectivity IP.

Figure 4. Tone Mapping Operator IP

4
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

Software Stack
The stack comprises application software built on several
layers of APIs and IP drivers with Yocto Linux recipes for the
presented camera solution. A discovery mechanism scans
the Avalon CPU interface on the FPGA and creates a device
tree at boot-up time. Such an approach enables the
decoupling of hardware and software development by
eliminating the need to synchronize static device trees across
different FPGA builds.

Application Software
The application software consists of individual modules
responsible for driving individual FPGA IPs. These SW
modules are disabled if their corresponding IPs are not in the
design. Most IPs have a dedicated section on a web server-
based GUI to configure the camera system. Input and output
resolutions, exposure and color correction modes/settings,
modes of individual IPs, and bypass functionalities are
examples of the settings exposed on the GUI (Fig. 5). It also
allows for fine-tuning more complicated IPs, such as creating
a custom Warp mesh, applying various OETF functions in 1D
LUT, uploading 3D LUT cube files, and visualizing the
histograms.
The back end of the application software is also responsible
for supporting multiple feedback loops across various
portions of the imaging pipeline. Some of those control loops
are self-contained, like the Warp software applying a
tessellation to the Warp IP, while others, like resolution
change, trigger a change for most of the IPs in the pipeline.

Auto White Balance


One of the significant ISP-specific control loops is the AWB
algorithm, which reads the sum of R/G and B/G ratios for 7x7
zones from the WBS IP and estimates the scene’s color
temperature. The color temperature of an ideal black body
radiation light source lies on the Planckian locus on a
chromaticity diagram. The ratios of colors can be mapped
perfectly onto the Planckian locus under ideal conditions,
where the light source is an ideal black body with a scene with
white content. Most modern LED light sources have a low
Color Rendering Index (CRI). Furthermore, CRI cannot profile
some LED and older fluorescent light sources. For
challenging scenes, AWB can be disabled, and manual or
fixed white-balance modes can be engaged, where the user
sets the color temperature or an earlier estimate is kept
constant, respectively. Finally, the AWB algorithm looks up
a set of WBC and CCM coefficients from its calibration tables
using the color temperature, where it interpolates the
coefficients read from the two closest calibration points in
the tables.

Auto Exposure and Black Level Balancing


Another primary ISP-specific control loop is AE, which
creates a feedback loop between HS and sensor exposure
controls. This process controls the image sensor’s exposure
time and analog and digital gains. The autoexposure targets
Figure 5. Camera Solution GUI: a – input options; b – ISP a mean pixel brightness (0.4 of the maximum by default)
configuration; c – output settings; d – sensor while minimizing pixels with extreme brightness or darkness.
calibration; e – real-time statistics control. Thresholds for extreme values are dynamically adjusted by
the autoexposure based on the shape and width of the luma
histogram. It is also responsible for balancing global exposure
versus exposure within an ROI set by the user.
5
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

A final ISP-specific control loop is the Black Level Balancing AI Integration


(BLB) algorithm, which monitors BLS and applies the
measured black levels to the BLC IP. If the measured black Since AI/ML revolutionized image processing and analysis
level is too far off from the actual pedestal value requested across multiple fields, its inclusion into vision systems and
from the sensor, the BLB algorithm compensates by adjusting camera solutions became ubiquitous. Its industrial and
the sensor pedestal value. machine vision applications span from object detection,
recognition, and classification/facial recognition to scene
Sensor Calibration segmentation, defect detection, medical analysis and
diagnosis, and many more. AI techniques are not limited to
The presented camera pipeline is robust and capable of high-level, complex analysis and automation, like number
processing video images from various sensors with the same plate recognition, navigation, retail analytics, and medical
CFA pattern. However, each sensor has its characteristics diagnosis. The application of AI-inspired methods in
and registers that almost always differ drastically from other controlling and real-time adjusting camera settings has
sensors. The presented ISP software stack addresses such become essential for consumer and industrial applications,
differences via a calibration process. such as intelligent crop and zoom, scene-sensitive 3A (AF,
While organizations like the European Machine Vision AWB, AE), object-aware ROI video encoding, depth
Association attempt to standardize the measurement and reconstruction from mono cameras and others.
presentation of sensors [10], most sensor manufacturers Traditionally, the leading platforms for deploying AI/ML
treat this information as trade secrets. In this work, we inference solutions have been GPUs and NPUs, thanks to
minimized the measurements required to obtain good results their performance advantages and the established
instead of relying on sensor manufacturer information. ecosystem for model development, optimization, and ease
The noise calibration process involves capturing a perfectly of use. However, using FPGAs for executing AI computing
unilluminated black scene and, at minimum, two frames from for applications with flexible connectivity and sensor fusion
a static scene using a combination of a set of exposure times requirements has been growing steadily [11]. There are
and another set of analog gains. For each exposure time and several reasons for this. Unlike devices mentioned above,
analog gain pair, a standard deviation of the dark noise σd is where the HW is fixed, the inherent flexibility of internal
measured from the black capture, and the histogram of the interconnect and dynamic programmability makes FPGAs
difference of static scene captures is used to calculate the a natural platform to fine-tune implementation for
standard deviation distribution of the shot noise σs: particular networks to strike the required balance between
precision (binary, int4, in8, int9, FP16, etc.) vs. cost (area,
ALM, power, etc.) vs performance (TOPS, inf/s, detection
rate, etc.). Moreover, recent advances in DSP block
where kd is the analog gain and σt is the standard deviation of architecture allow high parallelization for operations, which
the total noise. The results further differ across pixel colors dominate ML inference. For example, enhanced Agilex 5
and a range of temperatures, which should be considered DSPs merge AI tensor blocks of Stratix 10 NX with DSP
for the relevant applications and are beyond the scope of this blocks of Agilex 7. As a result, an Agilex 5 DSP can calculate
work. The ISP application software keeps ANR in a feedback 20 shared-exponent int8 multiply-add operations in one
loop by applying the correct noise parameters for the sensor’s clock cycle with pre-loaded coefficients in tensor mode.
exposure state. The coefficients are side-loaded and dual-buffered in this
The WBC and CCM coefficients depend on the light source’s mode, and a buffer may be loaded. At the same time, the
color temperature; therefore, a set of images is taken with a other one is operational, resulting in zero wait cycles for the
set of light sources with known color temperature. The color suitable tensor geometries. Loading coefficients in parallel
temperature guessed by the AWB algorithm or entered is also possible by introducing 2 wait cycles. Agilex 5 DSPs
manually drives the software to read the two closest WBC also increase Agilex 7’s 4 int9 multiply-add fixed-point DSP
and CCM coefficients and to interpolate between them functionality to 6 int9 multiply-adds that may benefit AI
before configuring their respective IPs. operations that require semi-random access.

A high-quality diffuser captures a perfectly uniform scene Furthermore, modern toolkits and methodologies, like
for VC mesh. The inverse of the capture calculates the mesh Intel’s OpenVINO™ and Altera’s FPGA AI Suite [12], allow
coefficients averaged across blocks. Any operation that designers to easily streamline porting ML models
changes the focal distance causes a change in the vignetting developed on conventional frameworks like TensorFlow
effect. Therefore, it might also be desirable to calibrate a set and PyTorch onto FPGA. A set of tools helps optimize the
of meshes for various focal distances and update the VC mode, produce an optimal architecture with specified
accordingly. Zooming and focusing cause the focal distance constraints, and generate inference ready to be integrated
change, which may be mitigated with expensive optics but into the design and deployed into the field. The overall
is not practical for most imaging applications. development process within the FPGA AI Suite is outlined
in Fig. 6. Extending existing design with AI capabilities
The GDC mesh is calculated similarly, preferably from a scene requires straightforward HW integration of the generated
containing a grid with straight lines. The Warp GUI gives a AI inference IP and adding a Video Frame Reader to overlay
few interfaces for easy manual calibration, while it might also required detection graphics, as depicted in Fig. 7. Software
be preferable to calculate the mesh coefficients offline with integration requires elaborate handling of the input framer
some more automation. queue, running inference, performing further analysis of
The rest of the IPs do not require calibration. However, in produced detection (such as spatiotemporal filtering,
some applications, creating more calibration tables to create grouping, etc.), and synchronization with the real-time
additional feedback loops might be desirable. video stream via frame buffer.
6
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

Altera Customer
4
Trained AI Model CPU
1 Intermediate x86 or Arm*
Representation
IR Data (.xml, .bin)
Inference
Engine

FPGA
Model Plugin-Runtime
Optimizer Environment
Software Runtime Support Flow
FPGA Hardware Support Flow
Quartus® Prime
Software
FPGA Resource 2
Specifications Platform Designer
FPGA AI Suite FPGA AI Suite
#LEs / ALMs DLA IP Other IP
Architecture IP Generator [.qsys] [.qsys]
#DSP Blocks Architecture
Optimizer Description
#RAM Blocks
(.arch)
Custom RTL
3
Board Files

Figure 6. FPGA AI Suite Development Flow

Figure 7. Block diagram of the camera solution integrated with the AI Suite

Additional Options cores over 16 bits per color sample (bps) is possible. Further
potential extension of the HDR capability is enabling
This section describes extensions to the presented solution support for multi-exposure and merge (stitch) of 2 or 3
that are available either by using pre-existing plug-and-play different exposures (capturing both bright and dark details)
IPs or by direct customization. into a single image. Note that on-sensor multi-exposure
options are still usable up to 16 bps with the presented
Alternative CFAs and infrared design.
One of the limitations of the presented solution is the lack of
One of the critical features for specific applications not
support for non-Bayer CFA patterns like RGBIr, RCCB,
enabled in this design is Auto Focus (AF). For contrast-
RCCC, and various flavors of quad CFA sensors. The
based AF, a contrast detector collects statistics while
presented architecture can be extended to cover all non-
feedback software advances the focus point in search of
quad CFA patterns and create a secondary pipeline for the
maximum contrast. There are also multiple on-sensor
Ir channel using cores such as Color Plane Manager (CPM)
technologies like OCL, where the sides of some pixels are
(Fig. 8). A separate grayscale pipeline might also benefit
covered to differentiate the intensity difference between
other monochrome sensor pipelines. Another option for
left and right angles, whose phase shift correlates with the
the Ir channel is to mix it into the RGB channels in low light
number of focus steps needed for optimum focus. Some
conditions or to recover colors from the RGB channels for
sensors implement a more advanced version of phase
applications without Ir cutoff filters.
detection with all pixels divided into left and right subpixels.
Another application-specific extension for the presented Yet another methodology is to use laser ranging to the
design is the enablement of an end-to-end HDR pipeline. subject, which might be a single beam or a more advanced
Extending the linear video pipeline limit of the processing lidar with a point cloud.

7
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs

References
[1] A.Lopich, et al., “A Framework for High-Performance
Image and Video Processing on FPGAs,” [Online].
Available:
https://www.intel.com/content/www/us/en/content-
details/813481/a-framework-for-high-performance-
image-and-video-processing-on-fpgas.html
[2] Sony, “Quad Bayer Coding” [Online]. Available:
https://www.sony-semicon.com/en/technology/
mobile/quad-bayer-coding.html
[3] Sony, “All-pixel Auto Focus (AF) Technology”
[Online]. Available: https://www.sony-semicon.com/
en/technology/mobile/autofocus.html
Figure 8. Extending presented camera solution with an [4] Sony, “SWIR Image Sensor” [Online]. Available:
infrared splitter and dedicated monochromatic https://www.sony-semicon.com/en/products/is/
ISP pipeline industry/swir.html

Multi-sensor fusion and synchronization are other critical [5] Altera, “Video and Vision Processing Suite,” [Online].
applications commonly used in industrial and consumer Available: https://www.intel.com/content/www/us/
applications. It is particularly useful for 3D imaging, 180 or en/products/details/fpga/intellectual-property/dsp/
360-degree stitching, digital pan-and-tilt zoom (PTZ), and video-vision-processing-suite.html
multi-modal sensor fusion, such as mixing visible spectrum [6] MIPI Alliance, “Camera Serial Interface 2 (MIPI
with depth, infrared, LIDAR, etc. The deterministic latency CSI-2)”, [Online]. Available: https://www.mipi.org/
of our presented ISP solution already enables multiple specifications/csi-2
imaging streams to stay synchronous across the pipeline,
[7] CoaXPress, [Online]. https://www.coaxpress.com/
enabling AR/VR applications out of the box. The next
phase of this work is to extend this design to have dual- [8] Altera, “3D LUT IP for FPGA” [Online]. Available:
sensor support. https://www.intel.com/content/www/us/en/
products/details/fpga/intellectual-property/dsp/3d-
lut.html
Conclusions [9] Altera, “ Altera FPGA streaming video protocol,”
This paper presents a design methodology for developing [Online]. Available: https://www.intel.com/content/
FPGA-based camera solutions for various applications www/us/en/docs/programmable/683397/current/
using off-the-shelf IP components. Coupled with FPGA about-the-intel-fpga-streaming-video.html
design techniques, the outlined approach enables [10] European Machine Vision Association, [Online].
significant resource and power savings for high data- Available: https://www.emva.org/standards-
processing image-processing applications with technology/emva-1288/
throughputs of up to 4.8 Gpixel/s per sensor. We cover
critical components of the overall system, such as [11] J. Ahmed, M.Jeervis, R.Venkata, “Altera® FPGAs and
connectivity, image processing pipeline, high-dynamic SoCs with FPGA AI Suite and OpenVINO Toolkit
range handling, and software stack. Furthermore, optional Drive Embedded/Edge AI/Machine Learning
extensions to the solution and AI functionality integration Applications,” [Online]. Available: https://www.intel.
allow for extending its direct application into multiple com/content/www/us/en/content-details/765466/
domains of machine vision. altera-fpgas-and-socs-with-fpga-ai-suite-and-
openvino-toolkit-drive-embedded-edge-ai-machine-
learning-applications-white-paper.
html?DocID=765466
[12] Altera, “FPGA AI Suite - AI Inference Development
Platform [Online]. Available: https://www.intel.com/
content/www/us/en/software/programmable/
fpga-ai-suite/overview.html

Intel technologies may require enabled hardware, software or service activation.


No product or component can be absolutely secure.
Your costs and results may vary.
© Altera Corporation. Altera, the Altera logo, and other Altera marks are trademarks of Altera Corporation or its subsidiaries.
*Other names and brands may be claimed as the property of others.

WP-01337-1.0

You might also like