High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper
High Performance Isp and Camera Sensor Pipeline Design On Fpgas Whitepaper
CPU and GPU implementations. Arguably, one of the most of implementation. However, it depends on factors such as
significant advantages of FPGA s is their intrinsic throughput and electrical IO compatibility and the availability
customization and IO flexibility. It is especially critical for of Phy and relevant protocol IPs and corresponding drivers.
image processing applications, where sensors, interfaces, Designers also need to consider the control interface (for
algorithms, and standards change over time, allowing for example, I2C or SPI) for the sensor to program and read back
future-proof solutions. High-speed interfaces like Ethernet the sensor’s run time parameters. Some common interfaces
and PCIe enable efficient data transfer between FPGA-based include MIPI CSI-2, SLVS-EC, LVDS, parallel interfaces
image acquisition solutions and external systems. (CMOS), and serial interfaces (SPI or I2C). The interface
In previous work, we have outlined a general methodology choice depends on data rate, signal integrity, noise immunity,
for designing video processing IPs with high fMAX (above cost and ease of integration, and compatibility between the
600 MHz) for high throughput applications [1]. Such image sensor and the receiving device. The throughput of
methodology is equally applicable to imaging solutions. the sensor also limits the choices of video interfaces; for
However, designing an FPGA-based camera solution requires example, an uncompressed 20-bit linear 8K video stream at
considering many factors besides core processing 120 frames per second requires 79.6 gigapixels per second
performance. The rest of the content is organized as follows: bandwidth, which typically results in a highly-parallel LVDS
Section II provides an overview of optical module selection interface.
options, Section III describes overall FPGA camera solution
architecture covering ingest, processing, and output options, FPGA System Components
Section IV covers the necessary ISP software stack, Section
V outlines the process of incorporating AI inference flow into A block diagram of a presented camera solution is provided
a camera solution, Section VI discusses additional options in Fig 1. At a high level, the main components of an imaging
and Section VII provides conclusions. system include the input interface and front-end, image signal
processor (ISP) pipeline, AI engine, Hard Processor
Subsystem (HPS) subsystem, application ISP software stack,
Optical System Selection memory subsystem, and output or data offload interfaces.
In a generic case, the ISP system is a hardware-software co-
Image Sensor design whose hardware components can be designed in RTL,
HLS, or picked from a pre-existing library from an FPGA
Creating a camera system starts with selecting imaging
vendor. In the presented work, we utilize components from
sensors and optics. The optics’ field of view and focal length
Altera’s Video and Vision Processing (VVP) Suite [5] and
must match the desired application requirements and be
other Altera® Quartus Prime Pro IPs for video connectivity,
compatible with the imaging sensor.
memory management, and overall control. The HPS is used
Optical zoom and focus requirements also dictate this to run the software stack that controls the hardware and
selection process, which extends to sensor selection where implements feedback mechanisms like Auto-Exposure (AE)
auto-focus features like Phase Detection Auto-Focus and Auto-White-Balance (AWB), as well as providing a
(PDAF), dual-pixel, or quad-pixel technologies must be Graphical User Interface (GUI) for overall system control.
considered.
The sensors usually have a certain Color Filter Array (CFA)
pattern, where each pixel captures only a specific color.
Typical CFA patterns include commercial Bayer/RGGB,
industrial/automotive RCCC or RCCB, industrial/security
RGBIr, less commonly used RGBW, and many more, where
R, G, B, C, Ir, and W stand for red, green, blue, clear, infrared
and white pixels, respectively. The CFA pattern might also
have different base pixel periods like 2x2 or 4x4, which may
have further flavors like quad high-dynamic range (HDR) [2],
2x1 or 2x2 On-Chip Lens (OCL) [3], etc. Some applications
benefit from monochrome sensors like grayscale, depth
sensors, and near-infrared or short-wavelength infrared [4]
for robotics, defense, and industrial applications.
Another set of considerations is HDR requirements, where
many sensor technologies exist for merging multiple Figure 1. 4K Camera Solution on Agilex 5 FPGA
exposures into a single frame to extend the sensors’ usual
10-12 bits linear Analog to Digital Converter (ADC) outputs Input Connectivity IPs
to 16-24 bits.
The detailed block diagram of the camera solution design is
Sensor Interfaces presented in Fig 2. The primary sensor input interface of the
ISP is MIPI CSI-2 [6]. MIPI CSI-2 offers a high-speed serial
The interface for connecting the image sensor to the FPGA interface supporting high data rates (up to 4.5 Gbps per lane).
is crucial. Such a consideration is essential from the support It is a standardized interface with wide adoption across
perspective – can the sensor be connected directly to the various industries and, crucially, with a wide variety of sensor
FPGA, or will additional bridging devices be required? Direct modules available for prototyping. The Agilex™ 5 E-Series
connectivity has the advantage of lower design cost and ease FPGAs and D-Series FPGAs natively support MIPI interfaces.
2
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs
Other camera interface options, such as USB, GigE, or Adaptive Noise Reduction (ANR) is a general noise reduction
CoaXPress [7], can be enabled via a video switch for function that reduces the graininess of the video by removing
demonstration and debugging purposes. In the presented various types of temporal sensor noise (as opposed to Fixed
work, we extended input options to a video frame reader, Test Pattern Noise (FPN)). Many factors contribute to temporal
Pattern Generator (TPG), and HDMI Rx. The HDMI input is noise: photon shot noise, pixel dark current, readout noise,
converted to an RGB format. The TPG and HDMI video amplifiers, and quantization noise. We limit ANR to spatial
streams are multiplexed via a video switch and passed noise reduction techniques in the presented work, leaving
through a simple Remosaic IP that converts the RGB signal temporal analysis for future upgrades.
to a Bayer format by dropping two color channels of every After DPC and ANR, the noise level of the video image is
pixel as a CFA pattern matching that of the sensor. A second lowered, and the signal-to-noise (SNR) ratio is improved. The
switch IP multiplexes the synthetic Bayer video with video subsequent functions of the pipeline are designated for color
streams taken from the sensor over the MIPI interface and enhancement and signal improvement. These operations
external memory via a video frame reader. are susceptible to non-linearities like an artificial black-level
Image Signal Processing Pipeline pedestal on the video signal. Therefore, removing the black
pedestal value at this stage is necessary. Based on BLS
In this section, we describe the main functional blocks of an statistics, the Black Level Correction (BLC) block removes
ISP pipeline. This work presents a generic pipeline suitable the black level pedestal value from the pixel values and scales
for a wide range of applications. However, the final the signal back to the full dynamic range.
implementation of the imaging solution and sequence of
processing blocks depend on many factors, such as sensor The Vignette Correction (VC) function aims to address
specification, application requirements, performance vs. deficiencies of the optical module. The optical system
size/power constraints, and many more. We review such generally accumulates less light at the video frame’s corners
additional considerations in Section VI. than at the center. This introduces various vignetting effects,
strengthening as the field of view becomes wide. The primary
Black Level Statistics (BLS) extracts the black-level purpose of the VC is to use a mesh of coefficients to correct
information of the sensor from a region of interest these vignetting effects, with a secondary application to
programmed for an unilluminated section of the sensor. This correct certain classes of fixed-pattern noise that the sensor
block provides information for monitoring the stability of the might introduce.
black level offset/pedestal value set on the sensor. This offset
value leaves a small portion of the dynamic range as foot room White Balance Statistics (WBS) block divides an image in
for the negative pixel values to accommodate for noise for the predefined region of interest (ROI) into zones and groups
the dark pixel values. Saturating those pixel values to zero at each 2x2 Bayer pixel in the zone into macro pixels. The R/G
this stage causes imbalanced noise distribution that and B/G ratios are calculated for each macro pixel and
manifests as imaging artifacts. Subsequently, a Clipper accumulated across each zone. A configurable pair of
function is used to crop the image to the desired size by thresholds is used to reject ratios that are too small or too
removing unilluminated pixels and excess pixels used by large since those values do not represent a white area. The
sensor manufacturers to make the frame wide or taller. software uses those statistics to predict the light source’s
color temperature. Various CFA dies used in sensor
The Defect Pixel Correction (DPC) function removes impulse manufacturing have different light permissibility distributions
pixel noise while preserving details by analyzing the input that cause imbalanced intensity values across different color
pixels against their neighboring pixels of the same color and channels. The primary use of a White Balance Correction
replacing outliers.
3
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs
(WBC) function is to correct for this intensity imbalance coefficients. Like white balance coefficients, the CCM
across color channels. However, this imbalance also depends coefficients depend on the sensor characteristics and the
on the light source’s color temperature. The automatic white light source’s color temperature.
balance (AWB) software applies the appropriate WBC Most optical systems introduce geometric distortion on the
coefficients from the color temperature prediction it made image by rounding the corners of the frame, which is more
using WBS. prominent with lenses of large field of view, such as fish-eye.
The Demosaic function is the last block in the pipeline that To perform geometric distortion correction (GDC), we use
operates in the Bayer domain. It converts the CFA video Warp IP (Fig. 3) and application software that maps a
stream into an RGB video stream by generating two of the correction mesh onto the IP to obtain a rectilinear image
missing color channels of each pixel via interpolation. The where straight lines remain straight. Another benefit of Warp
rest of the pipeline operates on RGB data. IP is that it acts as a video frame buffer and scaler. It isolates
The Histogram Statistics (HS) block calculates two input and output video streams, eliminating the need to
histograms in luma domain, one across the whole frame and synchronize them and running design in the free-running
one across a programmable ROI. The auto-exposure (AE) mode.
software uses these statistics to calculate the sensor’s High Dynamic Range (HDR) Considerations
exposure time, analog gain, and digital gain values.
The human visual system has adaptability to a very high-
The Unsharp Mask (USM) Filter enhances the image’s dynamic range, but at any given time, we may only perceive
sharpness by boosting or softening the image’s high- a limited portion of the dynamic range. One way of adjusting
frequency components, thus increasing the perceived details the image’s dynamic range to human vision is tone mapping,
of the scene. where the contrasts of darker and lighter areas are enhanced.
The CFA filters used to create individual colors on the sensor It can be achieved using specialized IPs to perform adaptive,
are sensitive to a spectrum of wavelengths of light, which content-aware, real-time local tone mapping operator (TMO)
does not align perfectly with human vision’s red, green, and (Fig. 4).
blue perception. Another factor contributing to such To perform advanced color space conversion, the designer
misalignment is that each registered color sample is affected must consider using a 3D look-up table (3D LUT). While CSC
by the light source of the other two. The problem can be IP can transform color spaces linearly, it is insufficient for
corrected by multiplying the [R, G, B] vector with a 3x3 Color applications like generating coverage of BT.2020 color space
Correction Matrix (CCM). A traditional video processing for HDR applications or constraining the wide color space to
Color Space Converter (CSC) IP is used to apply the CCM a more limited display like sRGB. Those non-linear operations
require advanced color manipulation capabilities of 3D LUT
to prevent false colors [8].
The final block in the core ISP pipeline is 1D LUT, whose
primary purpose is applying an Opto-Electrical Transfer
Function (OETF) suitable for the video sink used. Applying
the correct OETF is essential for generating correct light
intensities across a variety of displays supporting different
Electro-Optical Transfer Function (EOTF) from legacy
gamma curves for SDR to Perceptual Quantization (PQ) and
Hybrid Log-Gamma (HLG) used for HDR. The secondary
purpose of the 1D-LUT IP is to apply an artistic effect on the
output image, like a ‘toe and knee’ curve, to gracefully roll off
the highlights and shadows, which acts as a global tone
mapping operation complementing TMO.
Figure 3. Geometric Distortion Correction implemented
by Warp IP Output Connectivity IPs and Other Functions
The intermediate output of the ISP after the 3D LUT is
connected to a video frame writer used as a frame grabber
to write frames to the external DDR memory.
A Mixer overlays the output of a custom icon generator onto
the ISP output and adds a letterboxing pattern or color
generated by the TPG.
The output of the 1D LUT is converted from lite to full
streaming video [9] and passed to an HDMI or DisplayPort
(DP) Tx connectivity IP.
4
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs
Software Stack
The stack comprises application software built on several
layers of APIs and IP drivers with Yocto Linux recipes for the
presented camera solution. A discovery mechanism scans
the Avalon CPU interface on the FPGA and creates a device
tree at boot-up time. Such an approach enables the
decoupling of hardware and software development by
eliminating the need to synchronize static device trees across
different FPGA builds.
Application Software
The application software consists of individual modules
responsible for driving individual FPGA IPs. These SW
modules are disabled if their corresponding IPs are not in the
design. Most IPs have a dedicated section on a web server-
based GUI to configure the camera system. Input and output
resolutions, exposure and color correction modes/settings,
modes of individual IPs, and bypass functionalities are
examples of the settings exposed on the GUI (Fig. 5). It also
allows for fine-tuning more complicated IPs, such as creating
a custom Warp mesh, applying various OETF functions in 1D
LUT, uploading 3D LUT cube files, and visualizing the
histograms.
The back end of the application software is also responsible
for supporting multiple feedback loops across various
portions of the imaging pipeline. Some of those control loops
are self-contained, like the Warp software applying a
tessellation to the Warp IP, while others, like resolution
change, trigger a change for most of the IPs in the pipeline.
A high-quality diffuser captures a perfectly uniform scene Furthermore, modern toolkits and methodologies, like
for VC mesh. The inverse of the capture calculates the mesh Intel’s OpenVINO™ and Altera’s FPGA AI Suite [12], allow
coefficients averaged across blocks. Any operation that designers to easily streamline porting ML models
changes the focal distance causes a change in the vignetting developed on conventional frameworks like TensorFlow
effect. Therefore, it might also be desirable to calibrate a set and PyTorch onto FPGA. A set of tools helps optimize the
of meshes for various focal distances and update the VC mode, produce an optimal architecture with specified
accordingly. Zooming and focusing cause the focal distance constraints, and generate inference ready to be integrated
change, which may be mitigated with expensive optics but into the design and deployed into the field. The overall
is not practical for most imaging applications. development process within the FPGA AI Suite is outlined
in Fig. 6. Extending existing design with AI capabilities
The GDC mesh is calculated similarly, preferably from a scene requires straightforward HW integration of the generated
containing a grid with straight lines. The Warp GUI gives a AI inference IP and adding a Video Frame Reader to overlay
few interfaces for easy manual calibration, while it might also required detection graphics, as depicted in Fig. 7. Software
be preferable to calculate the mesh coefficients offline with integration requires elaborate handling of the input framer
some more automation. queue, running inference, performing further analysis of
The rest of the IPs do not require calibration. However, in produced detection (such as spatiotemporal filtering,
some applications, creating more calibration tables to create grouping, etc.), and synchronization with the real-time
additional feedback loops might be desirable. video stream via frame buffer.
6
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs
Altera Customer
4
Trained AI Model CPU
1 Intermediate x86 or Arm*
Representation
IR Data (.xml, .bin)
Inference
Engine
FPGA
Model Plugin-Runtime
Optimizer Environment
Software Runtime Support Flow
FPGA Hardware Support Flow
Quartus® Prime
Software
FPGA Resource 2
Specifications Platform Designer
FPGA AI Suite FPGA AI Suite
#LEs / ALMs DLA IP Other IP
Architecture IP Generator [.qsys] [.qsys]
#DSP Blocks Architecture
Optimizer Description
#RAM Blocks
(.arch)
Custom RTL
3
Board Files
Figure 7. Block diagram of the camera solution integrated with the AI Suite
Additional Options cores over 16 bits per color sample (bps) is possible. Further
potential extension of the HDR capability is enabling
This section describes extensions to the presented solution support for multi-exposure and merge (stitch) of 2 or 3
that are available either by using pre-existing plug-and-play different exposures (capturing both bright and dark details)
IPs or by direct customization. into a single image. Note that on-sensor multi-exposure
options are still usable up to 16 bps with the presented
Alternative CFAs and infrared design.
One of the limitations of the presented solution is the lack of
One of the critical features for specific applications not
support for non-Bayer CFA patterns like RGBIr, RCCB,
enabled in this design is Auto Focus (AF). For contrast-
RCCC, and various flavors of quad CFA sensors. The
based AF, a contrast detector collects statistics while
presented architecture can be extended to cover all non-
feedback software advances the focus point in search of
quad CFA patterns and create a secondary pipeline for the
maximum contrast. There are also multiple on-sensor
Ir channel using cores such as Color Plane Manager (CPM)
technologies like OCL, where the sides of some pixels are
(Fig. 8). A separate grayscale pipeline might also benefit
covered to differentiate the intensity difference between
other monochrome sensor pipelines. Another option for
left and right angles, whose phase shift correlates with the
the Ir channel is to mix it into the RGB channels in low light
number of focus steps needed for optimum focus. Some
conditions or to recover colors from the RGB channels for
sensors implement a more advanced version of phase
applications without Ir cutoff filters.
detection with all pixels divided into left and right subpixels.
Another application-specific extension for the presented Yet another methodology is to use laser ranging to the
design is the enablement of an end-to-end HDR pipeline. subject, which might be a single beam or a more advanced
Extending the linear video pipeline limit of the processing lidar with a point cloud.
7
White Paper | High-performance Image Signal Processing and Camera Sensor Pipeline Design on FPGAs
References
[1] A.Lopich, et al., “A Framework for High-Performance
Image and Video Processing on FPGAs,” [Online].
Available:
https://www.intel.com/content/www/us/en/content-
details/813481/a-framework-for-high-performance-
image-and-video-processing-on-fpgas.html
[2] Sony, “Quad Bayer Coding” [Online]. Available:
https://www.sony-semicon.com/en/technology/
mobile/quad-bayer-coding.html
[3] Sony, “All-pixel Auto Focus (AF) Technology”
[Online]. Available: https://www.sony-semicon.com/
en/technology/mobile/autofocus.html
Figure 8. Extending presented camera solution with an [4] Sony, “SWIR Image Sensor” [Online]. Available:
infrared splitter and dedicated monochromatic https://www.sony-semicon.com/en/products/is/
ISP pipeline industry/swir.html
Multi-sensor fusion and synchronization are other critical [5] Altera, “Video and Vision Processing Suite,” [Online].
applications commonly used in industrial and consumer Available: https://www.intel.com/content/www/us/
applications. It is particularly useful for 3D imaging, 180 or en/products/details/fpga/intellectual-property/dsp/
360-degree stitching, digital pan-and-tilt zoom (PTZ), and video-vision-processing-suite.html
multi-modal sensor fusion, such as mixing visible spectrum [6] MIPI Alliance, “Camera Serial Interface 2 (MIPI
with depth, infrared, LIDAR, etc. The deterministic latency CSI-2)”, [Online]. Available: https://www.mipi.org/
of our presented ISP solution already enables multiple specifications/csi-2
imaging streams to stay synchronous across the pipeline,
[7] CoaXPress, [Online]. https://www.coaxpress.com/
enabling AR/VR applications out of the box. The next
phase of this work is to extend this design to have dual- [8] Altera, “3D LUT IP for FPGA” [Online]. Available:
sensor support. https://www.intel.com/content/www/us/en/
products/details/fpga/intellectual-property/dsp/3d-
lut.html
Conclusions [9] Altera, “ Altera FPGA streaming video protocol,”
This paper presents a design methodology for developing [Online]. Available: https://www.intel.com/content/
FPGA-based camera solutions for various applications www/us/en/docs/programmable/683397/current/
using off-the-shelf IP components. Coupled with FPGA about-the-intel-fpga-streaming-video.html
design techniques, the outlined approach enables [10] European Machine Vision Association, [Online].
significant resource and power savings for high data- Available: https://www.emva.org/standards-
processing image-processing applications with technology/emva-1288/
throughputs of up to 4.8 Gpixel/s per sensor. We cover
critical components of the overall system, such as [11] J. Ahmed, M.Jeervis, R.Venkata, “Altera® FPGAs and
connectivity, image processing pipeline, high-dynamic SoCs with FPGA AI Suite and OpenVINO Toolkit
range handling, and software stack. Furthermore, optional Drive Embedded/Edge AI/Machine Learning
extensions to the solution and AI functionality integration Applications,” [Online]. Available: https://www.intel.
allow for extending its direct application into multiple com/content/www/us/en/content-details/765466/
domains of machine vision. altera-fpgas-and-socs-with-fpga-ai-suite-and-
openvino-toolkit-drive-embedded-edge-ai-machine-
learning-applications-white-paper.
html?DocID=765466
[12] Altera, “FPGA AI Suite - AI Inference Development
Platform [Online]. Available: https://www.intel.com/
content/www/us/en/software/programmable/
fpga-ai-suite/overview.html
WP-01337-1.0