A Stereo Perception Framework For Autonomous Vehicles

Uploaded by

4728Vishakha Kamble

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views

A Stereo Perception Framework For Autonomous Vehicles

Uploaded by

4728Vishakha Kamble

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Stereo Perception Framework for Autonomous Vehicles

Narsimlu Kemsaram, Anweshan Das, and Gijs Dubbelman

Abstract— Stereo cameras are crucial sensors for self-driving

vehicles as they are low-cost and can be used to estimate
depth. It can be used for multiple purposes, such as object
detection, depth estimation, semantic segmentation, etc. In this
paper, we propose a stereo vision-based perception framework
for autonomous vehicles. It uses three deep neural networks
simultaneously to perform free-space detection, lane boundary
detection, and object detection on image frames captured using
the stereo camera. The depth of the detected objects from the
vehicle is estimated from the disparity image computed using
two stereo image frames from the stereo camera. The proposed
stereo perception framework runs at 7.4 Hz on the Nvidia
Drive PX 2 hardware platform, which further allows for its
use in multi-sensor fusion for localization, mapping, and path
planning by autonomous vehicle applications.
Index Terms— advanced driver assistance system, au-
tonomous vehicle, deep neural network, depth estimation,
free space detection, lane detection, object detection, stereo
camera, stereo perception, stereo vision.

I. I NTRODUCTION
Fig. 1: Block Diagram of the Proposed Stereo Perception Framework.
Perceiving the environment accurately in real-time is one
of the most challenging tasks for autonomous vehicles.
Perception refers to the ability of the autonomous vehicle to
such as ACC [2], forward collision warning (FCW) [3], lane
collect sensor data, extract relevant knowledge, and develop
departure warning (LDW) [4], etc.
a contextual understanding of the environment, for example,
In recent years there has been a lot of advancement in
detection of obstacles, lanes, and the drivable area in front of
the field of deep learning. Deep neural networks (DNNs),
the vehicle [1]. Sensors such as cameras, lidars, and radars
specifically different variants of convolutional neural network
are used in autonomous driving vehicles to perceive the en-
(CNN) have been used in the field of computer vision for
vironment around it. LiDARs are very accurate active depth
complex tasks like object detection [5], pattern recognition
measurement sensors, but these are very expensive and are
[6], segmentation [7], depth estimation [8], etc. It even
not ready to be equipped on consumer-grade vehicles. Radars
outperforms human beings in tasks like object recognition
are robust sensors used in advanced driver assistance systems
and pattern recognition [9]. The main drawback of com-
(ADAS) like adaptive cruise control (ACC), blind-spot de-
plex DNNs is that it is computationally very expensive
tection, etc. But the resolution of radar is not high enough
and cannot meet real-time system constraints. But, with
to extract semantic information of the surrounding. Cameras,
recent development in graphical processing units (GPUs) and
on the other hand, are the only sensor that is comparatively
DNN optimized hardware platforms, it is becoming easier to
less expensive and provides a high level of details such as
deploy DNNs for real-time applications.
color, contrast, texture information, which allows for better
semantic understanding of the environment and can also be Depth estimation of objects around the surrounding can
used to estimate depth. With ever-increasing performance be performed using a monocular camera or a stereo camera.
in dynamic lighting conditions and being relatively cheap There is no direct method to estimate depth from a monocu-
to manufacture, camera-based systems are widely used in lar camera. The depth information of the surroundings can be
today’s ADAS and autonomous vehicles. Monocular camera- estimated by assuming the surface to be planar or by using
based systems are generally used for ADAS applications image feature tracking with additional information about the
twist of the camera in subsequent frames [10]. DNN based
Narsimlu Kemsaram ([email protected]), Anweshan monocular depth estimation techniques are not robust enough
Das ([email protected]) and Gijs Dubbelman to be used in automotive-grade applications. Stereo cameras
([email protected]) are with the Department of Electrical use intrinsic projective geometry between two views, it is
Engineering, Mobile Perception Systems Research Lab, Video Coding
and Architectures/Signal Processing Systems Research Group, Eindhoven independent of scene structure and only depends on camera’s
University of Technology (TU/e), 5612 AZ Eindhoven, The Netherlands. internal and external parameters. It is widely used for depth

978-1-7281-5207-3/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
estimation in the robotics and automotive domain. optical center as the origin, which needs to be again trans-
In this paper, we propose and evaluate a stereo camera- formed to the vehicle coordinate system using a rigid body
based perception framework for autonomous vehicles. Fig- transformation. The vehicle uses a right-handed coordinate
ure 1 shows the block diagram of the stereo perception system, where the vehicle origin is considered to be under the
framework. The framework uses DNNs to perform lane center of the rear axle. The x-axis points forward to the front
boundary detection, free space detection, object detection, of the vehicle, the y-axis points to the left of the vehicle, and
and classification on the left image frame of the stereo the z-axis points to the upward of the vehicle. The camera has
camera. It uses the left and right image frame for the stereo a right-handed coordinate system, where the camera origin is
camera to compute a disparity image then estimates the depth at the optical center of the left camera. The x-axis points to
of the detected objects. The framework requires a lot of the right of the image plane, the y-axis points to the bottom
parallel processing power from GPUs. PCs with consumer- of the image plane, and the z-axis points forward along the
grade GPUs consume a lot of electricity, and vehicles do optical axis. The estimated intrinsic and transformed extrinsic
not produce that much of electrical energy. We use Nvidia’s parameters are written to a rig configuration file in XML
Drive PX 2 platform to deploy the framework [11]. The format, which is used in the stereo perception framework.
Drive PX 2 is a very powerful and efficient automotive-
grade platform which can be used to deploy highly optimized III. S TEREO V ISION
DNNs for real-time applications. The stereo vision module computes the depth of the
The main contributions of this paper are: i) Developed detected objects in the stereo perception framework. It is
and integrated a stereo camera-based perception framework divided into three parts, namely stereo rectification, disparity
for autonomous vehicles, ii) Developed depth estimation computation, and depth estimation.
module, iii) Evaluated performance of the depth estimation
and stereo perception framework in real-time. A. Stereo Rectification
This paper is structured as follows: Section II discusses The left and right image frames of the stereo camera are
about the camera calibration process. Section III provides undistorted and rectified before computing disparity. The
details of the stereo vision module. Section IV provides left and right cameras are modeled as a pinhole camera
details of the DNN module. Section V explains the proposed [15]. Image undistortion refers to the process of removing
stereo perception framework in detail. Section VI explains lens distortion artifact form the image. Lens distortion is
the experimental setup in detail. Section VII discusses the modeled as two types, namely radial distortion and tangential
experimental results. Section VIII evaluates the depth esti- distortion. Straight lines in an image appear curved due to
mation and stereo perception framework performance. In the radial distortion, and the effect is more prominent as we
Section IX presents the paper conclusions. move away from the center of the image. It is represented
using two coefficients k1 , k2 . Tangential distortion occurs if
II. C AMERA C ALIBRATION the lens is not mounted parallel to the image sensor. It is
represented using two coefficients p1 , p2 . Let’s consider that
The process of estimating the intrinsic and extrinsic pa- a point in 3D when projected on a camera image plane is
rameters of a camera is called camera calibration [12]. Since represented as (x1 , y1 ), but due to lens distortion, it appears
the manufacturing process of camera sensors and its lenses to be at (x2 , y2 ). The distorted points are described using
are never perfect, precise camera calibration is essential to re- the following function [16]:
project 2D images into the 3D world. The intrinsic parameter
characterize the geometric, digital, and optical characteristics x2 = x1 (1 + k1 r2 + k2 r4 ) + 2p1 x1 y1 + p2 (r2 + 2x21 ) (1)
of the camera. It is specific to each camera. It is composed of
y2 = y1 (1 + k1 r2 + k2 r4 ) + 2p2 x1 y1 + p1 (r2 + 2y12 ) (2)
the principal point or optical center (cx , cy ), the focal length
(fx , fy ), the pixel size (px , py ), the skew coefficient (s), the r is the distance of the pixel from the optical centre,
where,
camera image resolution, and the lens distortion coefficients r = x21 + y12 . These coefficients are estimated during the
(k1 , k2 , p1 , p2 ). The extrinsic parameter represents the six camera calibration process.
degrees of freedom (6DoF) pose of the camera in the world. The undistorted left and right camera images are then
It is represented by a translation vector T and a rotation rectified. Image rectification is a transformation process that
matrix R [13]. Stereo camera calibration is the process projects left and right camera images onto a common plane
of estimating the intrinsic and extrinsic parameter of two parallel to the line between optical centers of the camera
cameras in the stereo setup. such that the epipolar lines become collinear and parallel
We use the Matlab stereo calibration toolbox [14] to to the horizontal image axes. In simple words, the image
perform stereo calibration. It uses multiple images of a transformation is such the projection of every point in the
checkerboard pattern captured using the stereo camera to left image will have a corresponding point on the right
estimate the intrinsic and extrinsic parameters, for more image on a horizontal line, which is collinear and parallel to
details on how the toolbox works, please refer to [14]. the horizontal image axes. This limits the search space for
The calibration toolbox computes the camera extrinsic pa- stereo correspondence [17]. For more details on the image
rameters of the stereo camera, considering the left camera’s rectification process, refer to [18].

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
B. Stereo Disparity B. Lane Detection
The disparity of all pixels, also known as the disparity A robust and accurate lane detection system is crucial
map, is computed using the stereo image pair. The disparity to ADAS systems like Lane Keep Assist System (LKAS),
is the distance between two corresponding points in the left Lane Departure Warning System (LDWS), and also provides
and right images of a stereo pair. The problem of finding vital information to autonomous vehicles. We use Nvidia’s
pixel correspondences between a stereo image pair is called proprietary DNN called LaneNet [21] to perform lane de-
stereo matching. After image rectification, the search space tection. The input image format to this network is RCCB
for the corresponding pixel is constrained to the epipolar (Red-Clear-Clear-Blue) image and the output is polylines
line. Pixels are matches by comparing the sum of absolute representing lane markings. It calculates a probability map
difference (SAD) or sum of squared differences (SSD) or of lane markings for each pixel using an encoder-decoder
normalized cross-correlation (NCC) of the intensity of pixels architecture on the input image. The map is then binarized
around it [19]. The disparity d of each pixel is represented into clusters of lane-markings, through which polylines are
as: fitted to assign lane position types. It recognizes the four
d = (ul − ur ) (3) different types of lane markings, such as left adjacent-lane,
left ego-lane, right ego-lane, and right adjacent-lane, when
where, (ul ) and (ur ) is the horizontal positions of the
they are present on the road. The lane detection module
correspondence pixel on the left and right image plane.
overlays polylines on the detected lane markings. The colors
C. Depth Estimation of the polylines represent the lane marking types are as
The output of the object detection and tracking module, follows: yellow for left adjacent-lane, red for left ego-lane,
which is explained in Section IV-A, is a bounding box green for right ego-lane, and blue for the right adjacent-lane.
around the detected object along with its label. The pixel C. Free Space Detection
disparity values of the detected objects are available from the
computed disparity map. The depth of a pixel is estimated Free space detection provides critical information about
from its disparity value using the triangulation equation from the drivable space to the navigation system of an autonomous
the stereo geometry [20]: vehicle. We use Nvidia’s proprietary DNN called OpenRoad-
Net [21] to perform the free space detection. The input to
z = f ∗ b/d (4) the network is RCCB (Red-Clear-Clear-Blue) image and the
where, z is the depth, f is a focal length, b is a baseline output is a boundary across the image from left to right.
distance between the left and right camera, d is the disparity. The boundary separates the obstacle from open road space.
Each pixel on the boundary is associated with one of the four
IV. D EEP N EURAL N ETWORK semantic labels: red for vehicle, blue for pedestrian, green
This part of the framework processes image frames to for curb, and yellow other.
understand the surroundings of the autonomous vehicle.
V. P ROPOSED S TEREO P ERCEPTION F RAMEWORK
It consists of three modules, namely object detection and
tracking, lane detection, and free space detection, which are In this Section, we present the stereo-vision based per-
explained in detail below. ception framework for autonomous vehicles. The functional
architecture of the stereo perception framework that is de-
A. Object Detection and Tracking ployed in Nvidia Drive PX 2 platform is shown in Figure
The object detection and tracking module is used to 2.
provide semantic information about the surroundings of the The input to the framework is a synchronized raw stereo
autonomous vehicle. This module consists of three parts: ob- image pair from a stereo camera or a video file. We use a
ject detection, object clustering, and object tracking. We use custom made stereo camera manufactured using two AR0231
Nvidia’s proprietary DNN called DriveNet [21] to perform GMSL cameras. The stereo camera is calibrated using a
object detection. The input to the object detection network stereo calibration tool, as mentioned in Section II, and
is RCCB (Red-Clear-Clear-Blue) image and the output is the camera calibration parameters are read from the rig
object proposals with bounding boxes. Each object can have configuration file during the initialization of the framework.
multiple proposals; the object clustering algorithm clusters Camera synchronization is guaranteed as the ports on which
these multiple proposals into one bounding box for each the two cameras are connected, are hardware synchronized.
detected object. The object tracking algorithm tracks the The images from the cameras are in Bayer RCCB (Red-
detected bounding boxes to maintain temporal consistency. Clear-Clear-Blue) format, which is converted to RGBA (Red-
It detects and tracks various six different classes of objects Green-Blue-Alpha) format before the rectification process.
such as car, truck, person, bicycle, traffic sign, and road sign. The left and right images are then undistorted and rectified,
It overlays bounding boxes on the detected objects. The color as explained the Section III-A. We use the stereo rectification
of the bounding boxes represents the classes that it detects are functionality provided in the Nvidia DriveWorks software
as follows: red for cars, cyan for trucks, green for persons, development kit (SDK) to perform the task. The rectified left
blue for bicycles, yellow for traffic signs, and magenta for and right camera images are converted to gray-scale images,
road signs. and a pyramid of Gaussian images is built up to a specified

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Functional Architecture of the Proposed Stereo Perception Framework.

level. The level 0 image of the pyramid or the full resolution This vehicle is equipped with a GMSL (Gigabit Multime-
gray-scale image is used for disparity computation. We use dia Serial Link) stereo camera, a Nvidia Drive PX 2 hardware
SSD pixel matching technique to find the stereo correspon- platform, Ubuntu 16.04 based computer with Intel Core-i7
dence of every pixel of the left image with the right image 7700k with Nvidia Titan XP and 32GB RAM, two LG 21
to compute the disparity map with respect to the left image inch 60Hz 1920x1080 pixels Full HD IPS LCD HDMI com-
as explained in Section III-B. The Nvidia DriveWorks SDK puter monitors along with HDMI cable to connect with Drive
provides the disparity computation library, and it returns both PX 2, Logitech K400 Plus Wireless Touch Keyboard, 10
the disparity map and disparity confidence map of the left Gigabit Ethernet switch, Huawei 3G/4G/Wifi modem/router
image. The disparity and confidence map are used to generate to provide internet to the computer and Drive PX 2, and
a colored disparity map, which is displayed as output, where CAT7 cable supports 10 Gigabit Ethernet protocol to connect
the invalid pixels are displayed in black color. the computer and Drive PX 2 via Ethernet switch. All of the
The rectified left camera image from the stereo rectifier equipment is powered using a 2000W 12V DC to 220V AC
is passed as input to the DNN module, as explained in converter connected to the battery of the vehicle. Figure 3
Section IV. The object detector and tracker described in shows the prototype vehicle along with the used hardware
Section IV-A, outputs region of interest of detected objects as components.
bounding boxes with its class. The depth estimator computes
the depth of each detected object by utilizing the computed
disparity map of the left image. We compute the disparity
of the each object by computing average disparity of 1/3
rd area of the bounding box around its centre. This filters
out outlier near the edges of the bounding box. The depth
of each object is then computed, as explained in Section III-
C. The lane detector described in Section IV-B can classify (a) A Toyota Prius Vehicle. (b) Nvidia Drive PX 2.
four different lane markings within an image, and overlays
the recognized lane markings on the output image. The
free space detector described in Section IV-C identifies the
drivable collision-free space within the image and overlays
the identified drivable area with a separation boundary on
the output image.
The output image is converted to an OpenGL image, and (c) GMSL Stereo Camera. (d) HDMI Computer Monitors.
it overlays the output of the object detector and tracking, Fig. 3: An Autonomous Research Vehicle Platform: (a) A Toyota Prius
depth estimator, lane detector, and free-space detector on it Vehicle equipped with (b) Nvidia Drive PX 2 (in the trunk of a vehicle), (c)
before the image rendering process. The image rendering GMSL Stereo Camera (at the rear view mirror), and (d) HDMI Computer
process renders the results from the previous modules in Monitors (at the back side of vehicle front seats).
a meaningful way to the user through the in-vehicle Tegra
A/Tegra B HDMI computer monitors. We use a custom-built stereo camera, which is com-
posed of two identical Sekonix GMSL Automotive Cameras
VI. E XPERIMENTAL S ETUP SF3323 with an ONSEMI CMOS AR0231 image sensor
[22], 1928x1208 resolution (2.3 Mega Pixel), 60 FOV (field
The TU/e–TASS International highly automated driving of view), focal length 5.8 mm, baseline of 30 cm, with
research prototype vehicle, based on a 3rd generation hybrid FAKRA (Fachkreis Automobil, a German Standard) connec-
Toyota Prius, is used to deploy and demonstrate the proposed tor. The stereo camera is firmly fixed using a rigid mounting
stereo perception framework. bar, high up at the rear-view mirror position, at the inner

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
center of the windshield, align the camera center vertically
with the horizon.
We use the Nvidia Drive PX 2 AutoChauffeur as an
embedded hardware platform, which contains the two parker
SoC (System on Chip), called Tegra A and Tegra B, two
discrete GPUs (dGPUs), two integrated GPUs (iGPUs), and
Aurix TC297. The Drive PX 2 hardware platform is mounted (a) Left stereo input image. (b) Right stereo input image.
in the trunk of a car.
The proposed stereo perception software framework is
developed in C++ on an Ubuntu 16.04 LTS and deployed
on a Drive PX 2 hardware platform with DriveWorks 0.6.67,
CUDA 9.0, and CuDNN 7.3.0 library.

VII. E XPERIMENTAL R ESULTS

In this Section, we show the intermediate results of the (c) Left stereo rectified image. (d) Right stereo rectified image.
proposed framework to get a better overview of how it works,
is depicted in Figure 4.
The acquired left and right images from the stereo camera
are displayed in Figure 4a and Figure 4b. The rectified left
and right images from the stereo rectifier are displayed in
Figure 4c and 4d. The computed confidence map with respect
to the left disparity map from the stereo disparity is displayed (e) Left stereo disparity map. (f) Result of objects with depth.
in Figure 4e. The detected objects on the road along with
the stereo depth are displayed in Figure 4f. The recognized
lane markings on the road along with their classification are
displayed in Figure 4g. The identified drivable free space on
the road along with obstacles classification are displayed in
Figure 4h. The detected objects along with depth, recognized
lane markings, and identified free space simultaneously on
the road by the proposed stereo perception framework, are (g) Result of lane detection. (h) Result of free space detection.
shown in Figure 4i.

VIII. P ERFORMANCE E VALUATION

In this Section, we evaluate the performance of the pro-
posed stereo perception framework by analyzing the depth
estimation output of detected objects and also its processing
time on two different platforms.

A. Depth Estimation
We compare the depth output of the framework with the
known distance of three different objects: a vehicle, a bicycle,
and a person. The computed confidence map of the vehicle
with respect to the left disparity map is displayed in Figure
5a, and the depth estimation of the vehicle is shown in Figure (i) Result of proposed stereo perception framework.
5b. The computed confidence map of the bicycle with respect Fig. 4: Experimental Results: Proposed Stereo Perception Framework.
to the left disparity map is displayed in Figure 5c, and the
depth estimation of the bicycle is shown in Figure 5d. The TABLE I: Depth Estimation Results (in meters).
computed confidence map of the person with respect to the
left disparity map is displayed in Figure 5e, and the depth Objects Actual Depth Estimated Depth Depth Error
estimation of the person is shown in Figure 5f. The depth Car 13.00 12.41 0.59
Bicycle 10.00 9.89 0.11
estimation results along with actual depth, estimated depth, Person 7.00 6.71 0.29
and depth error, are summarized in Table I.

B. Processing Time
We compare the processing time of the proposed stereo PX 2 platform, are shown in Table II. The processing time
perception framework with the Nvidia DNNs: DriveNet, of the proposed framework is 99 ms (10.1 Hz) on a laptop
LaneNet, and OpenRoadNet, on the Ubuntu 16.04 and Drive with Quadro M1200 GPU and quad-core Intel Core i7 CPU

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
363265/10024085). This i-CAVE programme is funded by
NWO (Netherlands Organisation for Scientific Research).
R EFERENCES
[1] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of
autonomous driving: common practices and emerging technologies,”
arXiv preprint arXiv:1906.05113, 2019.
(a) Car stereo disparity map. (b) Car depth estimation. [2] G. P. Stein, O. Mano, and A. Shashua, “Vision-based acc with a single
camera: bounds on range and range rate accuracy,” in IEEE IV2003
Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683),
June 2003, pp. 120–125.
[3] E. Dagan, O. Mano, G. P. Stein, and A. Shashua, “Forward collision
warning with a single camera,” in IEEE Intelligent Vehicles Sympo-
sium, 2004, June 2004, pp. 37–42.
[4] M. Haloi and D. B. Jayagopi, “A robust lane detection and departure
warning system,” in 2015 IEEE Intelligent Vehicles Symposium (IV),
June 2015, pp. 126–131.
(c) Bicycle stereo disparity map. (d) Bicycle depth estimation. [5] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object
detection using deep neural networks,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2014.
[6] P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and
S. Zafeiriou, “End-to-end multimodal emotion recognition using deep
neural networks,” IEEE Journal of Selected Topics in Signal Process-
ing, vol. 11, no. 8, pp. 1301–1309, Dec 2017.
[7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2015.
(e) Person stereo disparity map. (f) Person depth estimation. [8] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised
monocular depth estimation with left-right consistency,” in The IEEE
Fig. 5: Experimental Results: Proposed Depth Estimation. Conference on Computer Vision and Pattern Recognition (CVPR), July
2017.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in
running Ubuntu 16.04 (x86 64 architecture) and 134 ms (7.4 The IEEE International Conference on Computer Vision (ICCV),
Hz) on the Drive PX 2 platform (aarch64 architecture), which December 2015.
[10] H. Zhuang, R. Sudhakar, and J. yu Shieh, “Depth estimation from a
is suitable for various low-speed ADAS applications. sequence of monocular images with known camera motion,” Robotics
and Autonomous Systems, vol. 13, no. 2, pp. 87 – 95, 1994.
TABLE II: Performance of frameworks (in milliseconds). [Online]. Available: http://www.sciencedirect.com/science/article/pii/
0921889094900515
[11] Nvidia Drive - Autonomous Vehicle Development Platforms, https:
Platform Nvidia Nvidia Nvidia Stereo
//developer.nvidia.com/drive/, [Online], 2019.
(architecture) DriveNet LaneNet OpenRoadNet Perception
[12] Z. Zhang, “Camera calibration,” Computer vision: a reference guide,
Ubuntu16.04 38 09 07 99 pp. 76–77, 2014.
(x86 64) [13] O. Faugeras, O. FAUGERAS, and M. I. of Technology, Three-
Drive PX2 34 06 04 134 dimensional Computer Vision: A Geometric Viewpoint, ser. Artificial
(aarch64) intelligence. MIT Press, 1993. [Online]. Available: https://books.
google.nl/books?id=Aa6TTW9dWy0C
[14] J.-Y. Bouguet, “Camera calibration toolbox for matlab (2008),” URL
http://www. vision. caltech. edu/bouguetj/calib doc, vol. 1080, 2008.
IX. C ONCLUSIONS [15] P. Sturm, Pinhole Camera Model. Boston, MA: Springer US,
2014, pp. 610–613. [Online]. Available: https://doi.org/10.1007/
In this paper, we proposed and developed a stereo per- 978-0-387-31439-6 472
ception for autonomous vehicles that runs real-time on the [16] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with
the OpenCV library. ” O’Reilly Media, Inc.”, 2008.
Nvidia Drive PX2 platform. We use images from a custom [17] U. R. Dhond and J. K. Aggarwal, “Structure from stereo-a review,”
made stereo camera manufactured using two AR0231 GMSL IEEE Transactions on Systems, Man, and Cybernetics, vol. 19, no. 6,
cameras as input to the framework. The framework processes pp. 1489–1510, Nov 1989.
[18] G. Xu and Z. Zhang, Epipolar Geometry in Stereo, Motion, and
the stereo image pair to detect objects and estimate its Object Recognition: A Unified Approach. USA: Kluwer Academic
depth, recognize lane boundaries, and identify drivable space Publishers, 1996.
simultaneously. It is deployed and tested on Drive PX2 [19] H. Hirschmuller and D. Scharstein, “Evaluation of cost functions for
stereo matching,” in 2007 IEEE Conference on Computer Vision and
platform in our prototype research vehicle to demonstrate the Pattern Recognition, June 2007, pp. 1–8.
practical feasibility in real-time environment. The framework [20] R. Hartley and A. Zisserman, Multiple view geometry in computer
runs at 7.4 Hz on Drive PX 2 platform, which is suitable for vision. Cambridge university press, 2003.
[21] NVIDIA DriveWorks Development Guide, https://developer.nvidia.
various low-speed ADAS applications. com/driveworks-docs/, [Online], 2019.
[22] Sekonix Camera Datasheets, https://developer.nvidia.com/driveworks/
ACKNOWLEDGMENT files/Sekonix AR0231 2MP SF332X Automotive GMSL Camera
Datasheet v2.2E.pdf, [Online], 2019.
This research work is part of the i-CAVE (integrated coop-
erative automated vehicles) research programme within the
Sensing, Mapping and Localization project (project number

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.

Oracle SPARC Servers Sales Assessment
75% (4)
Oracle SPARC Servers Sales Assessment
6 pages
The Holy Tablets PDF 2
No ratings yet
The Holy Tablets PDF 2
1 page
LEGO Mindstorms
No ratings yet
LEGO Mindstorms
45 pages
VME Pro v2 ITS
No ratings yet
VME Pro v2 ITS
162 pages
Neural Network Adaption For Depth Sensor Replication
No ratings yet
Neural Network Adaption For Depth Sensor Replication
11 pages
Stereospike: Depth Learning With A Spiking Neural Network
No ratings yet
Stereospike: Depth Learning With A Spiking Neural Network
12 pages
Group 3 Report
No ratings yet
Group 3 Report
14 pages
Tech Seminar
No ratings yet
Tech Seminar
13 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
Stereo Vision Computer Depth Perception 2011: MEA Engg. College Pmna
No ratings yet
Stereo Vision Computer Depth Perception 2011: MEA Engg. College Pmna
18 pages
Ali Real-Time Vehicle Distance Estimation Using Single View Geometry
No ratings yet
Ali Real-Time Vehicle Distance Estimation Using Single View Geometry
10 pages
Symmetry 14 02657
No ratings yet
Symmetry 14 02657
14 pages
CV SCE
No ratings yet
CV SCE
12 pages
Machines 11 01068 v2
No ratings yet
Machines 11 01068 v2
14 pages
AI Models for 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
No ratings yet
AI Models for 3D Object Detection in Autonomous Systems: Leveraging LiDAR and Depth Sensing
8 pages
Object Detection and Localization Using Stereo Cameras
No ratings yet
Object Detection and Localization Using Stereo Cameras
6 pages
FPGA Design and Implementation of A Real-Time Stereo Vision System
No ratings yet
FPGA Design and Implementation of A Real-Time Stereo Vision System
12 pages
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
No ratings yet
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
10 pages
a-survey-on-deep-learning-approaches-for-data-integration-in-26mdyhdm
No ratings yet
a-survey-on-deep-learning-approaches-for-data-integration-in-26mdyhdm
25 pages
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
No ratings yet
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
14 pages
Sample Case Study Report 3
No ratings yet
Sample Case Study Report 3
8 pages
Wang Pseudo-LiDAR From Visual Depth Estimation Bridging The Gap in 3D CVPR 2019 Paper
No ratings yet
Wang Pseudo-LiDAR From Visual Depth Estimation Bridging The Gap in 3D CVPR 2019 Paper
9 pages
Monocular Depth Estimation Based On Deep Learning An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning An Overview
16 pages
Ummenhofer DeMoN Depth and CVPR 2017 Paper
No ratings yet
Ummenhofer DeMoN Depth and CVPR 2017 Paper
10 pages
Can We Unify Monocular Detectors For Autonomous Driving by Using The Pixel-Wise Semantic Segmentation of CNNS?
No ratings yet
Can We Unify Monocular Detectors For Autonomous Driving by Using The Pixel-Wise Semantic Segmentation of CNNS?
4 pages
Sensors 22 01201 v2
No ratings yet
Sensors 22 01201 v2
26 pages
Output
No ratings yet
Output
2 pages
Department of Computer Science and Engineering - University of Bologna
No ratings yet
Department of Computer Science and Engineering - University of Bologna
23 pages
Artigo-Pseudo-LiDAR From Visual Depth Estimation
No ratings yet
Artigo-Pseudo-LiDAR From Visual Depth Estimation
16 pages
Demon: Depth and Motion Network For Learning Monocular Stereo
No ratings yet
Demon: Depth and Motion Network For Learning Monocular Stereo
22 pages
Fdsafdsfsafasdfbrwa
No ratings yet
Fdsafdsfsafasdfbrwa
14 pages
Yang DrivingStereo A Large-Scale Dataset For Stereo Matching in Autonomous Driving CVPR 2019 Paper
No ratings yet
Yang DrivingStereo A Large-Scale Dataset For Stereo Matching in Autonomous Driving CVPR 2019 Paper
10 pages
开题报告 Akash 5
No ratings yet
开题报告 Akash 5
1 page
Array: Abhishek Gupta, Alagan Anpalagan, Ling Guan, Ahmed Shaharyar Khwaja
No ratings yet
Array: Abhishek Gupta, Alagan Anpalagan, Ling Guan, Ahmed Shaharyar Khwaja
20 pages
Word-Template-CSI-7-CVP_2023 (1) copy
No ratings yet
Word-Template-CSI-7-CVP_2023 (1) copy
5 pages
Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly
No ratings yet
Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly
41 pages
Computer Vision
No ratings yet
Computer Vision
8 pages
Science Research Paper 2021 FINAL
No ratings yet
Science Research Paper 2021 FINAL
24 pages
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
No ratings yet
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
10 pages
Accurate, Low-Latency Visual Perception For Autonomous Racing: Challenges, Mechanisms, and Practical Solutions
No ratings yet
Accurate, Low-Latency Visual Perception For Autonomous Racing: Challenges, Mechanisms, and Practical Solutions
7 pages
1902 07830
No ratings yet
1902 07830
27 pages
2408.06113v1
No ratings yet
2408.06113v1
8 pages
SESSION-7
No ratings yet
SESSION-7
13 pages
Franke 1998
No ratings yet
Franke 1998
9 pages
CSE480: Machine Vision
No ratings yet
CSE480: Machine Vision
51 pages
sensors-25-00035-v2
No ratings yet
sensors-25-00035-v2
6 pages
01 Introduction
No ratings yet
01 Introduction
19 pages
DL UNIT-V
No ratings yet
DL UNIT-V
17 pages
Electronics 09 00589
No ratings yet
Electronics 09 00589
17 pages
FADNet A Fast and Accurate Network for Disparity Estimation
No ratings yet
FADNet A Fast and Accurate Network for Disparity Estimation
7 pages
(Paper 5) Kira2012
No ratings yet
(Paper 5) Kira2012
8 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
2019 Scopus AReviewon Stereo Vision Algorithms Challengesand Solutions
No ratings yet
2019 Scopus AReviewon Stereo Vision Algorithms Challengesand Solutions
18 pages
Presentation
No ratings yet
Presentation
10 pages
SSRN Id4296815
No ratings yet
SSRN Id4296815
11 pages
NVRadarNet Real-Time Radar Obstacle and Free Space Detection For Autonomous Driving
No ratings yet
NVRadarNet Real-Time Radar Obstacle and Free Space Detection For Autonomous Driving
7 pages
Framework of Advanced Driving Assistance System (ADAS) Research
No ratings yet
Framework of Advanced Driving Assistance System (ADAS) Research
8 pages
04_ML+and+DL+in+ADAS+-+Processors-2
No ratings yet
04_ML+and+DL+in+ADAS+-+Processors-2
7 pages
Computer Vision: Linda Shapiro
No ratings yet
Computer Vision: Linda Shapiro
73 pages
thesis__ (4)_removed
No ratings yet
thesis__ (4)_removed
35 pages
Lec00 Intro For Web Highlighted
No ratings yet
Lec00 Intro For Web Highlighted
72 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Automatic Number Plate Recognition: Unlocking the Potential of Computer Vision Technology
From Everand
Automatic Number Plate Recognition: Unlocking the Potential of Computer Vision Technology
Fouad Sabry
No ratings yet
United States Patent: (10) Patent No .: US 10, 099, 630 B1 (45) Date of Patent: Oct - 16, 2018
No ratings yet
United States Patent: (10) Patent No .: US 10, 099, 630 B1 (45) Date of Patent: Oct - 16, 2018
20 pages
US20180015886A1
No ratings yet
US20180015886A1
14 pages
Monocular Camera Based Computer Vision System For Cost Effective Autonomous Vehicle
No ratings yet
Monocular Camera Based Computer Vision System For Cost Effective Autonomous Vehicle
5 pages
Pedestrian Detection For Autonomous Vehicle Using Multi-Spectral Cameras
No ratings yet
Pedestrian Detection For Autonomous Vehicle Using Multi-Spectral Cameras
9 pages
18-Graph Based Dependency Parsing-19-09-2024
No ratings yet
18-Graph Based Dependency Parsing-19-09-2024
19 pages
MANSCIE Midterm Notes
No ratings yet
MANSCIE Midterm Notes
57 pages
RAG Slide ENG
No ratings yet
RAG Slide ENG
41 pages
Cloud Computing
No ratings yet
Cloud Computing
18 pages
00.spellathon Primary Teacher Guide
No ratings yet
00.spellathon Primary Teacher Guide
4 pages
Swissgear Tsa Lock Instructions
No ratings yet
Swissgear Tsa Lock Instructions
1 page
Efficient C Coding For AVR
No ratings yet
Efficient C Coding For AVR
15 pages
Informatica Data Quality 9 5 PDF
No ratings yet
Informatica Data Quality 9 5 PDF
3 pages
Spanning Tree Protocol
No ratings yet
Spanning Tree Protocol
52 pages
Multiple Choice Questions
0% (1)
Multiple Choice Questions
6 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
St. Joseph'S College, Prayagraj: Subject Half Yearly Syllabus Class 5: Half Yearly Syllabus: September 2022
No ratings yet
St. Joseph'S College, Prayagraj: Subject Half Yearly Syllabus Class 5: Half Yearly Syllabus: September 2022
1 page
STLD Previous Papers
100% (1)
STLD Previous Papers
24 pages
Aix Admin
100% (1)
Aix Admin
224 pages
ДРТ
No ratings yet
ДРТ
4 pages
Current Trend Prelim Exam
No ratings yet
Current Trend Prelim Exam
1 page
ZKAccess Leaflet R0915 E
No ratings yet
ZKAccess Leaflet R0915 E
2 pages
article_careers360_20241121020407
No ratings yet
article_careers360_20241121020407
7 pages
AHE Introduction Deck 1.2.24
No ratings yet
AHE Introduction Deck 1.2.24
17 pages
Hortatory Exposition Text
No ratings yet
Hortatory Exposition Text
5 pages
01 Huawei Unlicensed PMP Microwave Solution RTN510-For Reading
No ratings yet
01 Huawei Unlicensed PMP Microwave Solution RTN510-For Reading
6 pages
IBM Consulting Interviews - Navigating The IBM GBS Maze - Management Consulted
No ratings yet
IBM Consulting Interviews - Navigating The IBM GBS Maze - Management Consulted
16 pages
Image Scaling Algorithms
No ratings yet
Image Scaling Algorithms
70 pages
Voltage Multiplier Lab
No ratings yet
Voltage Multiplier Lab
2 pages
Ttrc-Ansys CFD
No ratings yet
Ttrc-Ansys CFD
2 pages
Error Code Mid 185 - Pid 84 - Fmi 2
No ratings yet
Error Code Mid 185 - Pid 84 - Fmi 2
3 pages

A Stereo Perception Framework For Autonomous Vehicles

Uploaded by

A Stereo Perception Framework For Autonomous Vehicles

Uploaded by

A Stereo Perception Framework for Autonomous Vehicles

Narsimlu Kemsaram, Anweshan Das, and Gijs Dubbelman

Abstract— Stereo cameras are crucial sensors for self-driving

978-1-7281-5207-3/20/$31.00 ©2020 IEEE

VII. E XPERIMENTAL R ESULTS

VIII. P ERFORMANCE E VALUATION

You might also like