0% found this document useful (0 votes)
12 views

Research Paper (3) (1) 2

Uploaded by

raja200227raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Research Paper (3) (1) 2

Uploaded by

raja200227raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

AUTONOMOUS DRIVING SIMULATION

GUIDED BY: R. SASI PREETHAM Y. NARENDER


Department of CSE Department of CSE
DR.D. BALAKRISHNAN
Kalasalingam Academy of Kalasalingam Academy of Research
Department of CSE
Research and Education and Education
Kalasalingam Academy of
KrishnanKovil, India KrishnanKovil, India
Research and Education
[email protected] [email protected]
Krishnan Kovil, India
[email protected]

D.SIVA KUMAR S.VENKATA RAJA


Department of CSE Department of CSE
Kalasalingam Academy of Kalasalingam Academy of
Research and Education Research and Education
Krishnan Kovil, India Krishnan Kovil, India
venkatasivakumar29@gmail sankhavaramvenkataraja@gmail
.com .com

systems and better performance. Instead of


ABSTRACT focusing on human-chosen criteria like lane
detection, the system's internal components will
To promptly transform raw images captured by
optimize themselves to enhance overall
a single front-facing camera into driving
performance. While these criteria are easy for
commands, we created a convolutional neural
people to understand, they don’t always
network (CNN). This end-to-end approach
guarantee the best results. By learning to
proved to be surprisingly effective. The system
address the problem with the fewest processing
is capable of navigating through traffic on local
steps, we can create smaller networks.
streets, whether lane markings are present or
not, as well as on highways, utilizing minimal For training, we utilized an NVIDIA DevBox
human training data. It also performs well in and Torch 7, and for determining where to drive,
environments like parking lots and gravel paths we employed an NVIDIA DRIVETM PX self-
where visual guidance is limited. driving car computer running Torch 7.
By relying solely on the human steering angle as
the training input, the system autonomously
develops internal representations of the
necessary processing tasks, such as recognizing
Introduction
important road features. For example, we never
explicitly trained it to identify road boundaries. Convolutional neural networks (CNNs)
have revolutionized the field of pattern
Our end-to-end method enhances all processing
recognition [1] [2]. Previously, most pattern
steps simultaneously, rather than breaking the
recognition tasks relied on classifiers
problem down into parts like lane marking
recognition, path planning, and vehicle control.
following a manual feature extraction phase.
We believe this will ultimately lead to smaller CNNs are groundbreaking because they
autonomously learn features from training

1
data. The convolution operation effectively significantly larger dataset and processing
captures the two-dimensional nature of power due to advancements made over the
images, making CNNs particularly powerful last 25 years. Additionally, our expertise in
for image recognition tasks. Moreover, they CNNs enables us to harness this powerful
require relatively few parameters to be technology effectively. (By modern
learned compared to the vast number of standards, ALVINN's fully-connected
operations involved in analyzing an entire architecture is quite limited.)
image with convolutional filters.
Despite DAVE's potential for end-to-end
Although CNNs with learned features have learning, it lacked the reliability needed to
been commercially utilized for over two fully replace more modular methods for off-
decades [3], two significant developments road driving, even though it contributed to
have led to a surge in their application in the initiation of the DARPA Learning
recent years. First, there are now extensive, Applied to Ground Robots (LAGR) program
labeled datasets available for training and [7]. In challenging scenarios, DAVE's
validation, such as the Large Scale Visual average distance between collisions was
Recognition Challenge (ILSVRC) [4]. approximately 20 meters.
Second, the implementation of CNN
Nine months ago, NVIDIA launched a new
training algorithms on highly parallel
initiative aimed at building upon DAVE and
graphics processing units (GPUs) has
creating a dependable system for driving on
significantly accelerated both learning and
public roads. The primary motivation
inference.
behind this research is to eliminate the need
In this research, we introduce a CNN that to identify specific human-defined features,
does more than merely identify patterns; it such as guardrails, lane markings, or other
comprehends the entire processing pipeline vehicles, and to avoid constructing a set of
necessary for steering a vehicle. More than a "if, then, else" rules based on these
decade ago, a small radio-controlled (RC) observations. The initial results of this new
car navigated a debris-laden alley as part of project are discussed in this paper.
the Defense Advanced Research Projects
Agency's (DARPA) DARPA Autonomous
Vehicle (DAVE) seedling initiative [5]. This
project served as the foundation for our
LITERATURE REVIEW
current work. The training data for DAVE
was generated by recording hours of human Levinson et al. (2011) explore the concepts
driving in similar but distinct environments, of scalability and safety within the context
combining a human operator's left and right of autonomous vehicle (AV) testing, while
steering commands with footage from two Koopman & Wagner (2016) address the
cameras. safety challenges inherent in this testing
process. They highlight that simulations
The pioneering efforts of Pomerleau [6],
mitigate real-world risks by providing a
who developed the Autonomous Land
secure and regulated environment for AV
Vehicle in a Neural Network (ALVINN)
evaluation, facilitating extensive and cost-
system in 1989, greatly inspired DAVE-2.
effective testing across a variety of driving
This work demonstrated that an end-to-end
scenarios.
trained neural network could successfully
operate a vehicle on public roads. Our Yan et al. (2021) present a survey of AV
approach differs as we can leverage a simulation platforms, and Dosovitskiy et al.

2
(2017) introduce CARLA, an open-source et al. (2017) discuss the training of AV
platform designed for autonomous driving perception systems using synthetic data.
that features customizable city environments
Ma et al. (2018) delve into AV control
and sensor simulations. The SVL Simulator
through dynamics simulation, while
is noted for its cloud-based capabilities,
Rampersad et al. (2020) emphasize the
allowing for tailored sensor and vehicle
significance of high-fidelity simulations in
configurations. NVIDIA DRIVE is
vehicle dynamics.
recognized for its high-fidelity simulations,
which are particularly advantageous for Paden et al. (2016) address the challenges
perception systems in the automotive associated with motion planning in
industry. autonomous vehicles, and Buehler et al.
(2007) highlight simulation-based decision-
Li et al. (2019) conduct a comparison
making testing conducted during the
between high-fidelity and low-fidelity
DARPA Urban Challenge.
simulations, while Chen et al. (2020)
emphasize the importance of fidelity in AV
simulation. They assert that accurately
replicating road conditions, sensor Overview
behaviors, and vehicle dynamics is crucial
. A simplified block diagram of the DAVE-2
for reliable training and testing. Research
training data collection system is displayed
indicates that simulations with higher
in Figure 1. The data-acquisition vehicle has
fidelity effectively narrow the simulation-to-
three cameras installed behind the
reality gap, which is essential for developing
windshield. The cameras' time-stamped
robust AV models.
footage is recorded concurrently with the
Pomerleau (2019) underscores the human driver's steering angle. The
importance of edge-case testing, and Controller Area Network (CAN) bus of the
Shalev-Shwartz et al. (2016) discuss how car is tapped to receive this steering order.
testing in rare edge cases facilitates the We encode the steering command as 1/r,
modeling of complex road interactions, such where r is the turning radius in meters, so
as crosswalks, merging, and multi-vehicle that our system is not dependent on the
scenarios. Simulations provide an effective geometry of the car. To avoid a singularity
means to investigate critical or uncommon when driving straight, we utilize 1/r rather
events, such as severe weather conditions or than r (the turning radius for driving straight
unexpected obstacles, which are difficult to is infinite). From left turns (negative values)
replicate in real-world tests. to right turns (positive values), 1/r gradually
moves through zero. Single images taken
Zablotskaia et al. (2020) investigate the use
from the video and combined with the
of synthetic data for training AVs, while Ros
appropriate steering command (1/r) make up
et al. (2016) discuss domain randomization
the training data. It is insufficient to train on
techniques in AV data generation.
data from the human driver alone. The
Diaz et al. (2019) examine the impacts of network needs to develop the ability to
simulated sensor noise on AV perception bounce back from errors. If not, the vehicle
models, and Geyer et al. (2020) focus on will gradually veer off the road. As a result,
sensor simulation fidelity. extra photos that depict the vehicle rotating
away from the road's direction and shifting
Richter et al. (2017) explore the use of
from the lane center are added to the
virtual data for vision-based AVs, and Chen

3
training data. The left and right cameras  Tunnels: Enclosed environments to
provide images for two distinct off-center assess performance under varying
shifts. Viewpoint modification of the image lighting conditions.
from the closest camera simulates additional
 Gravel Roads: Off-road conditions to
camera shifts and all rotations. We don't evaluate handling on non-paved
have the necessary 3D scene knowledge for surfaces.
precise viewpoint transformation. Therefore,
we assume that all points above the horizon  Residential Roads: Areas with parked
are infinitely far away and all points below cars, simulating urban driving
the horizon are on flat ground in order to challenges.
approximate the transformation. This creates  Two-Lane Roads: Featuring both
distortions for objects that protrude above marked and unmarked lanes to test
the ground, such cars, poles, trees, and lane-following capabilities.
buildings, but it works well for flat terrain.
Fortunately, network training is not much  Clear: Ideal visibility conditions.
hampered by these distortions. For photos  Cloudy: Reduced light levels affecting
that have been altered, the steering label is perception.
changed to one that would direct the car
back to two seconds to reach the desired  Foggy: Challenging visibility requiring
position and orientation. Figure 2 displays a advanced sensor capabilities.
block diagram of our training system. A  Snowy: Adverse weather impacting
CNN receives images and uses them to traction and visibility.
calculate a suggested steering command.
The CNN's weights are changed to get the  Wet: Rainy conditions that alter road
CNN output closer to the intended output dynamics.
after comparing the suggested command Additionally, there were instances where the
with the desired command for that picture. sun was low in the sky, causing glare from the
The Torch 7 machine learning package's windshield that reflected off the road surface,
implementation of back propagation is used presenting further challenges for the perception
to achieve the weight adjustment. A single system.
center camera's video images can be used to
teach the network to produce steering. Data was collected using two vehicles: a 2016
Lincoln MKZ equipped with a drive-by-wire
Figure 3 depicts this arrangement
system and a 2013 Ford Focus with cameras
positioned similarly to those in the Lincoln.
Data Collection Importantly, the system is not constrained to
The training data for the DAVE-2 system was
any specific car brand or model, allowing for
gathered by driving across a diverse range of
flexibility in data acquisition.
roadways, capturing a variety of lighting and
weather conditions. The primary focus was on Drivers were encouraged to remain fully
central New Jersey, although highway data was attentive while driving but were allowed to
also collected from states such as Illinois, operate the vehicles as they normally would,
Michigan, Pennsylvania, and New York. The which helped ensure that the collected data
types of roads included in the dataset were: reflected real-world driving behavior. As of
March 28, 2016, approximately 72 hours of
 Highways: High-speed driving scenarios
driving data had been successfully gathered,
to understand vehicle behavior in fast-
providing a comprehensive dataset for training
moving traffic.

4
the DAVE-2 system to navigate a wide array of feature extraction without reducing the
driving scenarios effectively. spatial dimensions of the input.

 Simulation
Network Architecture  Initially, we evaluate the network's
The DAVE-2 system employs a neural network performance through simulation before
designed to minimize the mean squared error deploying a trained CNN in real-world
between the network's predicted steering scenarios. Figure 5 illustrates a
command and the actual commands provided simplified block diagram of the
by a human driver, as well as modified steering simulation process. The simulator
commands for rotated and off-center images. generates images that closely mimic
The architecture of the network is illustrated in what the CNN would perceive while
Figure 4 and consists of nine layers, including: steering the vehicle, using pre-recorded
 Input Layer: The input image is first footage from a forward-facing on-board
divided into YUV color planes, which camera mounted on a human-operated
helps in processing the image data data-collection vehicle. The steering
more effectively. commands recorded from the human
driver are synchronized with these test
 Normalization Layer: The first layer of videos.
the network is responsible for image  To ensure accuracy, we manually
normalization. This normalization calibrate the lane center for each frame
process is hard-coded and remains in the video used by the simulator,
unchanged during the training phase. acknowledging that human drivers may
By embedding normalization within the not always maintain a central position
network, the process can be optimized in the lane. This calibrated position is
using GPU processing, allowing for referred to as the "ground truth." To
faster computations and adaptability to account for deviations from this ground
different network architectures. truth, the simulator adjusts the source
 Convolutional Layers: The network images accordingly. It is important to
contains five convolutional layers aimed note that any discrepancies between
at feature extraction. The configuration the ground truth and the actual path
of these layers was determined taken by the human driver are
empirically through a series of incorporated into this adjustment. The
experiments that varied their settings. transformation is performed using the
The specifics of the convolutional layers same methods described in Section 2.
are as follows:  The simulator accesses the recorded
 First Three Convolutional Layers: test video along with the corresponding
Utilize strided convolutions with a steering commands that were executed
stride of 2 and a kernel size of 5x5. This during the video’s capture. The first
configuration helps in down-sampling frame of the selected test video,
the input while extracting relevant modified to reflect any deviations from
features. the ground truth, is then provided as
 Final Two Convolutional Layers: input to the trained CNN. The CNN
Employ non-strided convolutions with a subsequently generates a steering
kernel size of 3x3, which allows for finer command for that frame. The position
and orientation of the simulated vehicle
are updated by feeding the CNN's

5
steering commands, along with the seconds, dividing by the amount of time that
recorded inputs from the human driver, has passed throughout the simulated test,
into a dynamic model [8]. and then deducting the result from one, we
 Next, the simulator alters the may get the percentage autonomy:
subsequent frame of the test video to (The quantity of interventions) The
reflect the car's new position achieved autonomy of 6 seconds is equal to (1 −) ·
by following the CNN's steering 100 (1) elapsed time [seconds].
instructions. This process is repeated as
the new image is fed back into the CNN. On-road Tests
The simulator tracks various metrics,
including yaw, the distance traveled by A trained network is loaded onto the
the virtual vehicle, and the off-center DRIVETM PX in our test vehicle and driven
distance, which measures how far the after proving to be a reliable system in the
car is from the lane center. If the off- simulator. The percentage of time the
center distance exceeds one meter, the vehicle spends using autonomous steering is
virtual vehicle's location and orientation how we gauge performance in these tests.
are corrected to align with the ground Lane changes and turning from one road to
truth for the corresponding frame of another are not included in this time. About
the original test video, simulating a 98% of the time, we are on our own when
virtual human intervention. driving from our Holmdel office to Atlantic
Highlands in Monmouth County, New
Evaluation Jersey. Additionally, there were no intercepts
Our networks undergo two stages of during our ten miles of travel on the Garden
evaluation: simulation and on-road testing. State Parkway, a multilane divided highway
An ensemble of preset test routes, with on and off ramps.
equivalent to roughly three hours and 100
miles of driving in Monmouth County, New
Jersey, are given steering directions by the Visualization of Internal CNN State
networks in our simulator. The test data,
which covers local roadways, residential The activations of the first two feature map
streets, and highways, was collected under layers for two different example inputs—a
various lighting and weather circumstances. forest scene and an unpaved road—are
illustrated in Figures 7 and 8. The feature
Simulation Tests map activations for the unpaved road clearly
We calculate the proportion of time the highlight the road's structure, while the
network could operate the vehicle on its activations for the forest image
own. Simulated human interventions are predominantly exhibit noise, indicating that
counted to determine the metric (see Section the CNN struggles to extract meaningful
6). When the simulated vehicle deviates information from this type of input.
more than one meter from the center line, This observation suggests that, by utilizing
these interventions take place. The time it only the human steering angle as a training
would take a person to regain control of the signal, the CNN has effectively learned to
car, center it, and then switch back to the identify relevant road features
self-steering mode is six seconds, which is autonomously. Notably, we did not
what we believe an actual intervention explicitly train the network to recognize
would take in real life. By counting the
number of interventions, multiplying by six

6
road outlines, yet it successfully developed
this capability on its own.

Results

Conclusion
Without the need for manual
categorization into specific tasks such
as road or lane sign recognition,
semantic abstraction, path planning,
and control, we have empirically
demonstrated that CNNs can
effectively learn the entire task of
lane and road following. The vehicle
was trained to operate under various
conditions, including sunny, overcast,
and rainy weather, as well as on
highways and local residential roads,
utilizing a minimal dataset of less
than 100 hours of driving.
Remarkably, with steering commands
as the sole training signal, the CNN is

7
capable of learning important road [5] Technologies Net-Scale, Inc. End-to-end
features. learning for autonomous off-road vehicle
control, July 2004. Final technical report.
For example, during the training Available at: http://net-scale.com/doc/net-
process, the system autonomously scale-dave-report.pdf
learns to identify the outline of a road [6] Pomerleau, D. A. ALVINN: A neural network-
without requiring explicit labeling of based autonomous land vehicle. Carnegie
the data. However, additional efforts Mellon University technical report, 1989.
Available
are necessary to enhance the at: http://repository.cmu.edu/cgi/viewcontent.c
network's robustness, develop gi?article=2874&context=compsci
methods for validating its
[7] DARPA LAGR program. Wikipedia.org.
performance, and improve the Available
visualization of the network's internal at: https://en.wikipedia.org/wiki/DARPA_LAGR
processing mechanisms.
[8] Qi, F. and Wang, D. Four-wheel steering
vehicle trajectory planning. May 21–26, 2001:
Proceedings of the IEEE International
References Conference on Robotics & Automation.
Available
[1] LeCun, Y., Hubbard, B., Denker, J. S., at: www.ntu.edu.sg/home/edwwang/confpaper
Henderson, D., Howard, R. E., and Jackel, L. D. s/wdwicar01.pdf
Handwritten zip code recognition using
backpropagation. Winter 1989; Neural [9] DAVE 2 is operating a Lincoln. Available
Computation, 1(4):541–551. Available at: https://drive.google.com/open?
at: http://yann.lecun.org/exdb/publis/pdf/lecun id=0B9raQzOpizn1TkRIa241ZnBEcjQ
-89e.pdf

[2] Hinton, G. E., Sutskever, I., and Krizhevsky, A.


Using deep convolutional neural networks for
ImageNet categorization. Advances in Neural
Information Processing Systems 25, pages
1097–1105, edited by F. Pereira, C. J. C. Burges,
L. Bottou, and K. Q. Weinberger. 2012; Curran
Associates, Inc. Available
at: http://papers.nips.cc/paper/4824-imagenet-
classification-with-deep-convolutional-neural-
networks.pdf

[3] Strom, B. I., Zuckert, D., Stenard, C. E.,


Sharman, D., and Jackel, L. D. Character
recognition with optical means for self-service
banking. Journal of AT&T Technical, 74(1):16–
24, 1995.

[4] The ILSVRC, or Large Scale Visual


Recognition Challenge. Available
at: http://www.image-net.org/challenges/LSVR
C/

You might also like