0% found this document useful (0 votes)
114 views

A Stereo Perception Framework For Autonomous Vehicles

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

A Stereo Perception Framework For Autonomous Vehicles

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Stereo Perception Framework for Autonomous Vehicles

Narsimlu Kemsaram, Anweshan Das, and Gijs Dubbelman

Abstract— Stereo cameras are crucial sensors for self-driving


vehicles as they are low-cost and can be used to estimate
depth. It can be used for multiple purposes, such as object
detection, depth estimation, semantic segmentation, etc. In this
paper, we propose a stereo vision-based perception framework
for autonomous vehicles. It uses three deep neural networks
simultaneously to perform free-space detection, lane boundary
detection, and object detection on image frames captured using
the stereo camera. The depth of the detected objects from the
vehicle is estimated from the disparity image computed using
two stereo image frames from the stereo camera. The proposed
stereo perception framework runs at 7.4 Hz on the Nvidia
Drive PX 2 hardware platform, which further allows for its
use in multi-sensor fusion for localization, mapping, and path
planning by autonomous vehicle applications.
Index Terms— advanced driver assistance system, au-
tonomous vehicle, deep neural network, depth estimation,
free space detection, lane detection, object detection, stereo
camera, stereo perception, stereo vision.

I. I NTRODUCTION
Fig. 1: Block Diagram of the Proposed Stereo Perception Framework.
Perceiving the environment accurately in real-time is one
of the most challenging tasks for autonomous vehicles.
Perception refers to the ability of the autonomous vehicle to
such as ACC [2], forward collision warning (FCW) [3], lane
collect sensor data, extract relevant knowledge, and develop
departure warning (LDW) [4], etc.
a contextual understanding of the environment, for example,
In recent years there has been a lot of advancement in
detection of obstacles, lanes, and the drivable area in front of
the field of deep learning. Deep neural networks (DNNs),
the vehicle [1]. Sensors such as cameras, lidars, and radars
specifically different variants of convolutional neural network
are used in autonomous driving vehicles to perceive the en-
(CNN) have been used in the field of computer vision for
vironment around it. LiDARs are very accurate active depth
complex tasks like object detection [5], pattern recognition
measurement sensors, but these are very expensive and are
[6], segmentation [7], depth estimation [8], etc. It even
not ready to be equipped on consumer-grade vehicles. Radars
outperforms human beings in tasks like object recognition
are robust sensors used in advanced driver assistance systems
and pattern recognition [9]. The main drawback of com-
(ADAS) like adaptive cruise control (ACC), blind-spot de-
plex DNNs is that it is computationally very expensive
tection, etc. But the resolution of radar is not high enough
and cannot meet real-time system constraints. But, with
to extract semantic information of the surrounding. Cameras,
recent development in graphical processing units (GPUs) and
on the other hand, are the only sensor that is comparatively
DNN optimized hardware platforms, it is becoming easier to
less expensive and provides a high level of details such as
deploy DNNs for real-time applications.
color, contrast, texture information, which allows for better
semantic understanding of the environment and can also be Depth estimation of objects around the surrounding can
used to estimate depth. With ever-increasing performance be performed using a monocular camera or a stereo camera.
in dynamic lighting conditions and being relatively cheap There is no direct method to estimate depth from a monocu-
to manufacture, camera-based systems are widely used in lar camera. The depth information of the surroundings can be
today’s ADAS and autonomous vehicles. Monocular camera- estimated by assuming the surface to be planar or by using
based systems are generally used for ADAS applications image feature tracking with additional information about the
twist of the camera in subsequent frames [10]. DNN based
Narsimlu Kemsaram ([email protected]), Anweshan monocular depth estimation techniques are not robust enough
Das ([email protected]) and Gijs Dubbelman to be used in automotive-grade applications. Stereo cameras
([email protected]) are with the Department of Electrical use intrinsic projective geometry between two views, it is
Engineering, Mobile Perception Systems Research Lab, Video Coding
and Architectures/Signal Processing Systems Research Group, Eindhoven independent of scene structure and only depends on camera’s
University of Technology (TU/e), 5612 AZ Eindhoven, The Netherlands. internal and external parameters. It is widely used for depth

978-1-7281-5207-3/20/$31.00 ©2020 IEEE


Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
estimation in the robotics and automotive domain. optical center as the origin, which needs to be again trans-
In this paper, we propose and evaluate a stereo camera- formed to the vehicle coordinate system using a rigid body
based perception framework for autonomous vehicles. Fig- transformation. The vehicle uses a right-handed coordinate
ure 1 shows the block diagram of the stereo perception system, where the vehicle origin is considered to be under the
framework. The framework uses DNNs to perform lane center of the rear axle. The x-axis points forward to the front
boundary detection, free space detection, object detection, of the vehicle, the y-axis points to the left of the vehicle, and
and classification on the left image frame of the stereo the z-axis points to the upward of the vehicle. The camera has
camera. It uses the left and right image frame for the stereo a right-handed coordinate system, where the camera origin is
camera to compute a disparity image then estimates the depth at the optical center of the left camera. The x-axis points to
of the detected objects. The framework requires a lot of the right of the image plane, the y-axis points to the bottom
parallel processing power from GPUs. PCs with consumer- of the image plane, and the z-axis points forward along the
grade GPUs consume a lot of electricity, and vehicles do optical axis. The estimated intrinsic and transformed extrinsic
not produce that much of electrical energy. We use Nvidia’s parameters are written to a rig configuration file in XML
Drive PX 2 platform to deploy the framework [11]. The format, which is used in the stereo perception framework.
Drive PX 2 is a very powerful and efficient automotive-
grade platform which can be used to deploy highly optimized III. S TEREO V ISION
DNNs for real-time applications. The stereo vision module computes the depth of the
The main contributions of this paper are: i) Developed detected objects in the stereo perception framework. It is
and integrated a stereo camera-based perception framework divided into three parts, namely stereo rectification, disparity
for autonomous vehicles, ii) Developed depth estimation computation, and depth estimation.
module, iii) Evaluated performance of the depth estimation
and stereo perception framework in real-time. A. Stereo Rectification
This paper is structured as follows: Section II discusses The left and right image frames of the stereo camera are
about the camera calibration process. Section III provides undistorted and rectified before computing disparity. The
details of the stereo vision module. Section IV provides left and right cameras are modeled as a pinhole camera
details of the DNN module. Section V explains the proposed [15]. Image undistortion refers to the process of removing
stereo perception framework in detail. Section VI explains lens distortion artifact form the image. Lens distortion is
the experimental setup in detail. Section VII discusses the modeled as two types, namely radial distortion and tangential
experimental results. Section VIII evaluates the depth esti- distortion. Straight lines in an image appear curved due to
mation and stereo perception framework performance. In the radial distortion, and the effect is more prominent as we
Section IX presents the paper conclusions. move away from the center of the image. It is represented
using two coefficients k1 , k2 . Tangential distortion occurs if
II. C AMERA C ALIBRATION the lens is not mounted parallel to the image sensor. It is
represented using two coefficients p1 , p2 . Let’s consider that
The process of estimating the intrinsic and extrinsic pa- a point in 3D when projected on a camera image plane is
rameters of a camera is called camera calibration [12]. Since represented as (x1 , y1 ), but due to lens distortion, it appears
the manufacturing process of camera sensors and its lenses to be at (x2 , y2 ). The distorted points are described using
are never perfect, precise camera calibration is essential to re- the following function [16]:
project 2D images into the 3D world. The intrinsic parameter
characterize the geometric, digital, and optical characteristics x2 = x1 (1 + k1 r2 + k2 r4 ) + 2p1 x1 y1 + p2 (r2 + 2x21 ) (1)
of the camera. It is specific to each camera. It is composed of
y2 = y1 (1 + k1 r2 + k2 r4 ) + 2p2 x1 y1 + p1 (r2 + 2y12 ) (2)
the principal point or optical center (cx , cy ), the focal length
(fx , fy ), the pixel size (px , py ), the skew coefficient (s), the  r is the distance of the pixel from the optical centre,
where,
camera image resolution, and the lens distortion coefficients r = x21 + y12 . These coefficients are estimated during the
(k1 , k2 , p1 , p2 ). The extrinsic parameter represents the six camera calibration process.
degrees of freedom (6DoF) pose of the camera in the world. The undistorted left and right camera images are then
It is represented by a translation vector T and a rotation rectified. Image rectification is a transformation process that
matrix R [13]. Stereo camera calibration is the process projects left and right camera images onto a common plane
of estimating the intrinsic and extrinsic parameter of two parallel to the line between optical centers of the camera
cameras in the stereo setup. such that the epipolar lines become collinear and parallel
We use the Matlab stereo calibration toolbox [14] to to the horizontal image axes. In simple words, the image
perform stereo calibration. It uses multiple images of a transformation is such the projection of every point in the
checkerboard pattern captured using the stereo camera to left image will have a corresponding point on the right
estimate the intrinsic and extrinsic parameters, for more image on a horizontal line, which is collinear and parallel to
details on how the toolbox works, please refer to [14]. the horizontal image axes. This limits the search space for
The calibration toolbox computes the camera extrinsic pa- stereo correspondence [17]. For more details on the image
rameters of the stereo camera, considering the left camera’s rectification process, refer to [18].

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
B. Stereo Disparity B. Lane Detection
The disparity of all pixels, also known as the disparity A robust and accurate lane detection system is crucial
map, is computed using the stereo image pair. The disparity to ADAS systems like Lane Keep Assist System (LKAS),
is the distance between two corresponding points in the left Lane Departure Warning System (LDWS), and also provides
and right images of a stereo pair. The problem of finding vital information to autonomous vehicles. We use Nvidia’s
pixel correspondences between a stereo image pair is called proprietary DNN called LaneNet [21] to perform lane de-
stereo matching. After image rectification, the search space tection. The input image format to this network is RCCB
for the corresponding pixel is constrained to the epipolar (Red-Clear-Clear-Blue) image and the output is polylines
line. Pixels are matches by comparing the sum of absolute representing lane markings. It calculates a probability map
difference (SAD) or sum of squared differences (SSD) or of lane markings for each pixel using an encoder-decoder
normalized cross-correlation (NCC) of the intensity of pixels architecture on the input image. The map is then binarized
around it [19]. The disparity d of each pixel is represented into clusters of lane-markings, through which polylines are
as: fitted to assign lane position types. It recognizes the four
d = (ul − ur ) (3) different types of lane markings, such as left adjacent-lane,
left ego-lane, right ego-lane, and right adjacent-lane, when
where, (ul ) and (ur ) is the horizontal positions of the
they are present on the road. The lane detection module
correspondence pixel on the left and right image plane.
overlays polylines on the detected lane markings. The colors
C. Depth Estimation of the polylines represent the lane marking types are as
The output of the object detection and tracking module, follows: yellow for left adjacent-lane, red for left ego-lane,
which is explained in Section IV-A, is a bounding box green for right ego-lane, and blue for the right adjacent-lane.
around the detected object along with its label. The pixel C. Free Space Detection
disparity values of the detected objects are available from the
computed disparity map. The depth of a pixel is estimated Free space detection provides critical information about
from its disparity value using the triangulation equation from the drivable space to the navigation system of an autonomous
the stereo geometry [20]: vehicle. We use Nvidia’s proprietary DNN called OpenRoad-
Net [21] to perform the free space detection. The input to
z = f ∗ b/d (4) the network is RCCB (Red-Clear-Clear-Blue) image and the
where, z is the depth, f is a focal length, b is a baseline output is a boundary across the image from left to right.
distance between the left and right camera, d is the disparity. The boundary separates the obstacle from open road space.
Each pixel on the boundary is associated with one of the four
IV. D EEP N EURAL N ETWORK semantic labels: red for vehicle, blue for pedestrian, green
This part of the framework processes image frames to for curb, and yellow other.
understand the surroundings of the autonomous vehicle.
V. P ROPOSED S TEREO P ERCEPTION F RAMEWORK
It consists of three modules, namely object detection and
tracking, lane detection, and free space detection, which are In this Section, we present the stereo-vision based per-
explained in detail below. ception framework for autonomous vehicles. The functional
architecture of the stereo perception framework that is de-
A. Object Detection and Tracking ployed in Nvidia Drive PX 2 platform is shown in Figure
The object detection and tracking module is used to 2.
provide semantic information about the surroundings of the The input to the framework is a synchronized raw stereo
autonomous vehicle. This module consists of three parts: ob- image pair from a stereo camera or a video file. We use a
ject detection, object clustering, and object tracking. We use custom made stereo camera manufactured using two AR0231
Nvidia’s proprietary DNN called DriveNet [21] to perform GMSL cameras. The stereo camera is calibrated using a
object detection. The input to the object detection network stereo calibration tool, as mentioned in Section II, and
is RCCB (Red-Clear-Clear-Blue) image and the output is the camera calibration parameters are read from the rig
object proposals with bounding boxes. Each object can have configuration file during the initialization of the framework.
multiple proposals; the object clustering algorithm clusters Camera synchronization is guaranteed as the ports on which
these multiple proposals into one bounding box for each the two cameras are connected, are hardware synchronized.
detected object. The object tracking algorithm tracks the The images from the cameras are in Bayer RCCB (Red-
detected bounding boxes to maintain temporal consistency. Clear-Clear-Blue) format, which is converted to RGBA (Red-
It detects and tracks various six different classes of objects Green-Blue-Alpha) format before the rectification process.
such as car, truck, person, bicycle, traffic sign, and road sign. The left and right images are then undistorted and rectified,
It overlays bounding boxes on the detected objects. The color as explained the Section III-A. We use the stereo rectification
of the bounding boxes represents the classes that it detects are functionality provided in the Nvidia DriveWorks software
as follows: red for cars, cyan for trucks, green for persons, development kit (SDK) to perform the task. The rectified left
blue for bicycles, yellow for traffic signs, and magenta for and right camera images are converted to gray-scale images,
road signs. and a pyramid of Gaussian images is built up to a specified

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
Fig. 2: Functional Architecture of the Proposed Stereo Perception Framework.

level. The level 0 image of the pyramid or the full resolution This vehicle is equipped with a GMSL (Gigabit Multime-
gray-scale image is used for disparity computation. We use dia Serial Link) stereo camera, a Nvidia Drive PX 2 hardware
SSD pixel matching technique to find the stereo correspon- platform, Ubuntu 16.04 based computer with Intel Core-i7
dence of every pixel of the left image with the right image 7700k with Nvidia Titan XP and 32GB RAM, two LG 21
to compute the disparity map with respect to the left image inch 60Hz 1920x1080 pixels Full HD IPS LCD HDMI com-
as explained in Section III-B. The Nvidia DriveWorks SDK puter monitors along with HDMI cable to connect with Drive
provides the disparity computation library, and it returns both PX 2, Logitech K400 Plus Wireless Touch Keyboard, 10
the disparity map and disparity confidence map of the left Gigabit Ethernet switch, Huawei 3G/4G/Wifi modem/router
image. The disparity and confidence map are used to generate to provide internet to the computer and Drive PX 2, and
a colored disparity map, which is displayed as output, where CAT7 cable supports 10 Gigabit Ethernet protocol to connect
the invalid pixels are displayed in black color. the computer and Drive PX 2 via Ethernet switch. All of the
The rectified left camera image from the stereo rectifier equipment is powered using a 2000W 12V DC to 220V AC
is passed as input to the DNN module, as explained in converter connected to the battery of the vehicle. Figure 3
Section IV. The object detector and tracker described in shows the prototype vehicle along with the used hardware
Section IV-A, outputs region of interest of detected objects as components.
bounding boxes with its class. The depth estimator computes
the depth of each detected object by utilizing the computed
disparity map of the left image. We compute the disparity
of the each object by computing average disparity of 1/3
rd area of the bounding box around its centre. This filters
out outlier near the edges of the bounding box. The depth
of each object is then computed, as explained in Section III-
C. The lane detector described in Section IV-B can classify (a) A Toyota Prius Vehicle. (b) Nvidia Drive PX 2.
four different lane markings within an image, and overlays
the recognized lane markings on the output image. The
free space detector described in Section IV-C identifies the
drivable collision-free space within the image and overlays
the identified drivable area with a separation boundary on
the output image.
The output image is converted to an OpenGL image, and (c) GMSL Stereo Camera. (d) HDMI Computer Monitors.
it overlays the output of the object detector and tracking, Fig. 3: An Autonomous Research Vehicle Platform: (a) A Toyota Prius
depth estimator, lane detector, and free-space detector on it Vehicle equipped with (b) Nvidia Drive PX 2 (in the trunk of a vehicle), (c)
before the image rendering process. The image rendering GMSL Stereo Camera (at the rear view mirror), and (d) HDMI Computer
process renders the results from the previous modules in Monitors (at the back side of vehicle front seats).
a meaningful way to the user through the in-vehicle Tegra
A/Tegra B HDMI computer monitors. We use a custom-built stereo camera, which is com-
posed of two identical Sekonix GMSL Automotive Cameras
VI. E XPERIMENTAL S ETUP SF3323 with an ONSEMI CMOS AR0231 image sensor
[22], 1928x1208 resolution (2.3 Mega Pixel), 60 FOV (field
The TU/e–TASS International highly automated driving of view), focal length 5.8 mm, baseline of 30 cm, with
research prototype vehicle, based on a 3rd generation hybrid FAKRA (Fachkreis Automobil, a German Standard) connec-
Toyota Prius, is used to deploy and demonstrate the proposed tor. The stereo camera is firmly fixed using a rigid mounting
stereo perception framework. bar, high up at the rear-view mirror position, at the inner

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
center of the windshield, align the camera center vertically
with the horizon.
We use the Nvidia Drive PX 2 AutoChauffeur as an
embedded hardware platform, which contains the two parker
SoC (System on Chip), called Tegra A and Tegra B, two
discrete GPUs (dGPUs), two integrated GPUs (iGPUs), and
Aurix TC297. The Drive PX 2 hardware platform is mounted (a) Left stereo input image. (b) Right stereo input image.
in the trunk of a car.
The proposed stereo perception software framework is
developed in C++ on an Ubuntu 16.04 LTS and deployed
on a Drive PX 2 hardware platform with DriveWorks 0.6.67,
CUDA 9.0, and CuDNN 7.3.0 library.

VII. E XPERIMENTAL R ESULTS


In this Section, we show the intermediate results of the (c) Left stereo rectified image. (d) Right stereo rectified image.
proposed framework to get a better overview of how it works,
is depicted in Figure 4.
The acquired left and right images from the stereo camera
are displayed in Figure 4a and Figure 4b. The rectified left
and right images from the stereo rectifier are displayed in
Figure 4c and 4d. The computed confidence map with respect
to the left disparity map from the stereo disparity is displayed (e) Left stereo disparity map. (f) Result of objects with depth.
in Figure 4e. The detected objects on the road along with
the stereo depth are displayed in Figure 4f. The recognized
lane markings on the road along with their classification are
displayed in Figure 4g. The identified drivable free space on
the road along with obstacles classification are displayed in
Figure 4h. The detected objects along with depth, recognized
lane markings, and identified free space simultaneously on
the road by the proposed stereo perception framework, are (g) Result of lane detection. (h) Result of free space detection.
shown in Figure 4i.

VIII. P ERFORMANCE E VALUATION


In this Section, we evaluate the performance of the pro-
posed stereo perception framework by analyzing the depth
estimation output of detected objects and also its processing
time on two different platforms.

A. Depth Estimation
We compare the depth output of the framework with the
known distance of three different objects: a vehicle, a bicycle,
and a person. The computed confidence map of the vehicle
with respect to the left disparity map is displayed in Figure
5a, and the depth estimation of the vehicle is shown in Figure (i) Result of proposed stereo perception framework.
5b. The computed confidence map of the bicycle with respect Fig. 4: Experimental Results: Proposed Stereo Perception Framework.
to the left disparity map is displayed in Figure 5c, and the
depth estimation of the bicycle is shown in Figure 5d. The TABLE I: Depth Estimation Results (in meters).
computed confidence map of the person with respect to the
left disparity map is displayed in Figure 5e, and the depth Objects Actual Depth Estimated Depth Depth Error
estimation of the person is shown in Figure 5f. The depth Car 13.00 12.41 0.59
Bicycle 10.00 9.89 0.11
estimation results along with actual depth, estimated depth, Person 7.00 6.71 0.29
and depth error, are summarized in Table I.

B. Processing Time
We compare the processing time of the proposed stereo PX 2 platform, are shown in Table II. The processing time
perception framework with the Nvidia DNNs: DriveNet, of the proposed framework is 99 ms (10.1 Hz) on a laptop
LaneNet, and OpenRoadNet, on the Ubuntu 16.04 and Drive with Quadro M1200 GPU and quad-core Intel Core i7 CPU

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.
363265/10024085). This i-CAVE programme is funded by
NWO (Netherlands Organisation for Scientific Research).
R EFERENCES
[1] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of
autonomous driving: common practices and emerging technologies,”
arXiv preprint arXiv:1906.05113, 2019.
(a) Car stereo disparity map. (b) Car depth estimation. [2] G. P. Stein, O. Mano, and A. Shashua, “Vision-based acc with a single
camera: bounds on range and range rate accuracy,” in IEEE IV2003
Intelligent Vehicles Symposium. Proceedings (Cat. No.03TH8683),
June 2003, pp. 120–125.
[3] E. Dagan, O. Mano, G. P. Stein, and A. Shashua, “Forward collision
warning with a single camera,” in IEEE Intelligent Vehicles Sympo-
sium, 2004, June 2004, pp. 37–42.
[4] M. Haloi and D. B. Jayagopi, “A robust lane detection and departure
warning system,” in 2015 IEEE Intelligent Vehicles Symposium (IV),
June 2015, pp. 126–131.
(c) Bicycle stereo disparity map. (d) Bicycle depth estimation. [5] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object
detection using deep neural networks,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2014.
[6] P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and
S. Zafeiriou, “End-to-end multimodal emotion recognition using deep
neural networks,” IEEE Journal of Selected Topics in Signal Process-
ing, vol. 11, no. 8, pp. 1301–1309, Dec 2017.
[7] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
for semantic segmentation,” in The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2015.
(e) Person stereo disparity map. (f) Person depth estimation. [8] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised
monocular depth estimation with left-right consistency,” in The IEEE
Fig. 5: Experimental Results: Proposed Depth Estimation. Conference on Computer Vision and Pattern Recognition (CVPR), July
2017.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in
running Ubuntu 16.04 (x86 64 architecture) and 134 ms (7.4 The IEEE International Conference on Computer Vision (ICCV),
Hz) on the Drive PX 2 platform (aarch64 architecture), which December 2015.
[10] H. Zhuang, R. Sudhakar, and J. yu Shieh, “Depth estimation from a
is suitable for various low-speed ADAS applications. sequence of monocular images with known camera motion,” Robotics
and Autonomous Systems, vol. 13, no. 2, pp. 87 – 95, 1994.
TABLE II: Performance of frameworks (in milliseconds). [Online]. Available: http://www.sciencedirect.com/science/article/pii/
0921889094900515
[11] Nvidia Drive - Autonomous Vehicle Development Platforms, https:
Platform Nvidia Nvidia Nvidia Stereo
//developer.nvidia.com/drive/, [Online], 2019.
(architecture) DriveNet LaneNet OpenRoadNet Perception
[12] Z. Zhang, “Camera calibration,” Computer vision: a reference guide,
Ubuntu16.04 38 09 07 99 pp. 76–77, 2014.
(x86 64) [13] O. Faugeras, O. FAUGERAS, and M. I. of Technology, Three-
Drive PX2 34 06 04 134 dimensional Computer Vision: A Geometric Viewpoint, ser. Artificial
(aarch64) intelligence. MIT Press, 1993. [Online]. Available: https://books.
google.nl/books?id=Aa6TTW9dWy0C
[14] J.-Y. Bouguet, “Camera calibration toolbox for matlab (2008),” URL
http://www. vision. caltech. edu/bouguetj/calib doc, vol. 1080, 2008.
IX. C ONCLUSIONS [15] P. Sturm, Pinhole Camera Model. Boston, MA: Springer US,
2014, pp. 610–613. [Online]. Available: https://doi.org/10.1007/
In this paper, we proposed and developed a stereo per- 978-0-387-31439-6 472
ception for autonomous vehicles that runs real-time on the [16] G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with
the OpenCV library. ” O’Reilly Media, Inc.”, 2008.
Nvidia Drive PX2 platform. We use images from a custom [17] U. R. Dhond and J. K. Aggarwal, “Structure from stereo-a review,”
made stereo camera manufactured using two AR0231 GMSL IEEE Transactions on Systems, Man, and Cybernetics, vol. 19, no. 6,
cameras as input to the framework. The framework processes pp. 1489–1510, Nov 1989.
[18] G. Xu and Z. Zhang, Epipolar Geometry in Stereo, Motion, and
the stereo image pair to detect objects and estimate its Object Recognition: A Unified Approach. USA: Kluwer Academic
depth, recognize lane boundaries, and identify drivable space Publishers, 1996.
simultaneously. It is deployed and tested on Drive PX2 [19] H. Hirschmuller and D. Scharstein, “Evaluation of cost functions for
stereo matching,” in 2007 IEEE Conference on Computer Vision and
platform in our prototype research vehicle to demonstrate the Pattern Recognition, June 2007, pp. 1–8.
practical feasibility in real-time environment. The framework [20] R. Hartley and A. Zisserman, Multiple view geometry in computer
runs at 7.4 Hz on Drive PX 2 platform, which is suitable for vision. Cambridge university press, 2003.
[21] NVIDIA DriveWorks Development Guide, https://developer.nvidia.
various low-speed ADAS applications. com/driveworks-docs/, [Online], 2019.
[22] Sekonix Camera Datasheets, https://developer.nvidia.com/driveworks/
ACKNOWLEDGMENT files/Sekonix AR0231 2MP SF332X Automotive GMSL Camera
Datasheet v2.2E.pdf, [Online], 2019.
This research work is part of the i-CAVE (integrated coop-
erative automated vehicles) research programme within the
Sensing, Mapping and Localization project (project number

Authorized licensed use limited to: MKSSS CUMMINS COLLEGE OF ENGINEERING FOR WOMEN. Downloaded on January 04,2022 at 06:29:47 UTC from IEEE Xplore. Restrictions apply.

You might also like