0% found this document useful (0 votes)
7 views

20231108

Uploaded by

tabsf41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

20231108

Uploaded by

tabsf41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.

11, November 2023 67

Classification of Objects using CNN-Based Vision and Lidar Fusion in


Autonomous Vehicle Environment
G.komali 1, Mr.Dr.A.Sri Nagesh 2.
1M.Tech Scholar, CSE Department, R.V.R & J.C College of Engineering,
Guntur, Andhra Pradesh, India. [email protected]
2 Professor, CSE Department, R.V.R & J.C College of Engineering,
Guntur, Andhra Pradesh, India. [email protected]

Abstract adjust the vehicle's decision-making process and control


In the past decade, Autonomous Vehicle Systems (AVS) have mechanism to take the necessary actions [1].
advanced at an exponential rate, particularly due to improvements The interest in autonomous vehicles has increased in recent
in artificial intelligence, which have had a significant impact on years due to the advances in multiple engineering fields such as
social as well as road safety and the future of transportation machine learning, robotic systems and sensor fusion. The progress
systems. The fusion of light detection and ranging (LiDAR) and of these techniques leads to more robust and trustworthy computer
camera data in real‐time is known to be a crucial process in many vision algorithms. Using sensors such as Laser Imaging Detection
applications, such as in autonomous driving, industrial automation and Ranging (LiDAR), radar, camera or ultrasonic sensors with
and robotics. Especially in the case of autonomous vehicles, the these techniques enables the system to detect relevant targets in
efficient fusion of data from these two types of sensors is highly dynamic surrounding scenarios. These targets may include
important to enabling the depth of objects as well as the pedestrians, cyclists, cars or motorbikes among others, as
classification of objects at short and long distances. This paper discussed in public autonomous car datasets [2].
presents classification of objects using CNN based vision and Sensors in autonomous vehicles (AV) are used in two main
Light Detection and Ranging (LIDAR) fusion in autonomous categories: localization to measure where the vehicle is on the road
vehicles in the environment. This method is based on and environment perception to detect what is around the vehicle.
convolutional neural network (CNN) and image up sampling Global Navigation Satellite System (GNSS), Inertial Measurement
theory. By creating a point cloud of LIDAR data up sampling and Unit (IMU), and vehicle odometry sensors are used to localize the
converting into pixel-level depth information, depth information AV. Localization is needed, so the AV knows its position with
is connected with Red Green Blue data and fed into a deep CNN. respect to the environment. Autonomous vehicles should be
The proposed method can obtain informative feature instantaneous, accurate, stable, and efficient in computations to
representation for object classification in autonomous vehicle produce safe and acceptable traveling trajectories in numerous
environment using the integrated vision and LIDAR data. This urban to suburb scenarios and from high-density traffic flow to
method is adopted to guarantee both object classification accuracy high-speed highways. In real-world traffic, various uncertainties
and minimal loss. Experimental results show the effectiveness and and complexities surround road and weather conditions, whereas
efficiency of presented approach for objects classification. a dynamic interaction exists between objects and obstacles, and
Keywords: tires and driving terrains. An autonomous vehicle must rapidly and
Autonomous Vehicle Systems (AVS), Light Detection and Ranging accurately detect, recognize, and classify and track dynamic
objects with complex backgrounds and posing technical
(LIDAR), CNN challenges.
Driving environment recognition serves as a driving
environment dynamic, static object detection, lane detection, and
vehicle location estimation based on sensors that can obtain
1. Introduction information about the driving environment, and the decision to
determine the vehicle trajectory, such as the creation and
As technology constantly evolves, autonomous vehicles avoidance of routes to the destination. Longitudinal and lateral
are becoming more popular, accessible, and affordable for more controls are performed to reliably drive the target control values
people in different countries and from different economic classes. of the vehicle determined by recognition and decision [3].
Increasing accessibility results in a safer transportation experience, To ensure correct and safe driving, a fundamental pillar in
fewer deaths, and minimal injuries due to human-made mistakes the Autonomous Driving (AD) system is perception, which
that cause catastrophic accidents. To ensure the safety of leverages sensors such as cameras and LiDARs (Light Detection
individuals, it is necessary to deploy highly efficient and accurate and Ranging) to detect surrounding obstacles in real time [4].
learning models trained on a broad range of driving scenarios to
precisely detect the surrounding objects under different weather Autonomous vehicles rely on their perception systems to
and lighting conditions. This learning procedure via training will acquire information about their immediate surroundings. It is

Manuscript received November 5, 2023


Manuscript revised November 20, 2023
https://doi.org/10.22937/IJCSNS.2023.23.11.8
68 IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.11, November 2023

necessary to detect the presence of other vehicles, pedestrians, and vehicle driving system, including hazardous object detection and
other relevant entities. Safety concerns and the need for accurate semantic segmentation.
estimations have led to the introduction of lidar systems to
complement camera- or radar-based perception systems [5]. Babak Shahian Jahromi, Theja Tulabandhula and Sabri
Regarding object distance estimation, there are several Cetin et. al., [11] presents Real-Time Hybrid Multi-Sensor Fusion
approaches that have been proposed, depending on the modality Framework for Perception in Autonomous Vehicles. We propose
of the sensors used, such as radar, LiDAR, or camera. Each sensor a new hybrid multi-sensor fusion pipeline configuration that
modality is capable of perceiving the environment with a specific performs environment perception for autonomous vehicles such as
perspective and is limited by detecting certain attribute road segmentation, obstacle detection, and tracking Tested on over
information of objects. More specifically, vision-based 3K road scenes, our fusion algorithm shows better performance in
approaches are more robust and accurate in object detection but various environment scenarios compared to baseline benchmark
fail in estimating the distance of the object accurately [6]. networks.
Deep learning algorithms have been utilized in different
aspects of AV systems, such as perception, mapping, and decision Bike Chen, Chen Gong, Jian Yang et. al., [12] discussed
making. These algorithms have proven their ability to solve many about Importance-Aware Semantic Segmentation for Autonomous
of these difficulties, including computational loads faced by Vehicles. The IAL (Importance Aware Losses) operates under a
traditional algorithms while maintaining decent accuracy and fast hierarchical structure and the classes with different importance are
processing speed. Currently high-performance vision system located in different levels so that they are assigned distinct weights
usually is based on deep learning techniques. Deep neural They derive the forward and backward propagation rules for IAL
networks (DNN) have proven to be an extremely powerful tool for and apply them to four typical deep neural networks for realizing
many vision tasks. Hence in this paper classification of objects SS in an intelligent driving system.
using CNN based Vision and LIDAR fusion in autonomous
vehicle environment is presented. Jin Fang, Feilong Yan, Tongtong Zhao, Feihu Zhang,
Dingfu Zhou, Ruigang Yang, Yu Ma and Liang Wang et. al., [13]
presents Simulating LIDAR Point Cloud for Autonomous Driving
II.. literature survey: using Real-world Scenes and Traffic Flows. We present a LIDAR
simulation framework that can automatically generate 3D point
G Ajay Kumar, Jin Hee Lee, Jongrak Hwang, Jaehyeong cloud based on LIDAR type and placement.
Park, Sung Hoon Youn and Soon Kwon et. al., [7] presents LiDAR
and Camera Fusion Approach for Object Distance Estimation in Xinxin Du, Marcelo H. Ang Jr. and Daniela Rus et. al., [14]
Self‐Driving Vehicles. This paper presents a method to estimate presents Car Detection for Autonomous Vehicle: LIDAR and
the distance (depth) between a self‐driving car and other vehicles, Vision Fusion Approach Through Deep Learning Framework.
objects, and signboards on its path using the accurate fusion They propose a LIDAR and vision fusion system for car detection
approach. Based on the geometrical transformation and projection, through the deep learning framework. With further optimization
low‐level sensor fusion was performed between a camera and of the framework structure, it has great potentials to be
LiDAR using a 3D marker. Jian Nie, Jun Yan, Huilin Yin, Lei implemented onto the autonomous vehicle.
Ren, and Qian Meng et. Al [8] presents A Multimodality Fusion
Deep Neural Network and Safety Test Strategy for Intelligent Andreas Eitel Jost Tobias Springenberg Luciano Spinello
Vehicles. In this paper, they firstly propose a multimodality fusion Martin Riedmiller Wolfram Burgard et. al., [15] presents
framework called Integrated Multimodality Fusion Deep Neural Multimodal Deep Learning for Robust RGB-D Object
Network (IMF-DNN), which can flexibly accomplish both object Recognition. architecture is composed of two separate CNN
detection and end-to-end driving policy for prediction of steering processing streams: one for each modality which are consecutively
angle and speed. combined with a late fusion network. They focus on learning with
imperfect sensor data, a typical problem inreal-world robotics
Yulong Cao, Chaowei Xiao, Benjamin Cyr, Yimeng Zhou, tasks.
Won Park, Sara Rampazzi, Qi Alfred Chen, Z. Morley Mao, Kevin
Fu et. al., [9] presents Adversarial Sensor Attack on LiDAR-based
Perception in Autonomous Driving. we perform the first security III. Classification of Objects using CNN.
study of LiDAR-based perception in AV settings, which is highly
important but unexplored. We consider LiDAR spoofing attacks In this work, classification of objects using CNN based
as the threat model and set the attack goal as spoofing obstacles vision and LIDAR fusion in autonomous vehicle environment is
close to the front of a victim AV. presented. The framework of presented model is shown in Fig. 1.
Mhafuzul Islam, Mashrur Chowdhury, Hongda Li, and
Hongxin Hu et. al., [10] presents Vision-Based Navigation of
Autonomous Vehicles in Roadway Environments with
Unexpected Hazards. They develop a DNN-based autonomous
vehicle driving system using object detection and semantic
segmentation to mitigate the adverse effect of this type of hazard,
which helps the autonomous vehicle to navigate safely around
such hazards. We find that our developed DNN-based autonomous
IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.11, November 2023 69

two information. Each dataset comprises 6843 labeled objects.


Finally, we present a structure based on CNN to train a classifier
for detecting the four kinds of objects on the road. These
classification results are provided to the driving cognitive module
for vehicle decision-making and control.
LiDAR gives us rich information with point clouds (which
include position coordinates x, y, z and intensity i) as well as a
depth map. The first step is to process the LiDAR point cloud data
(PCD). The raw LiDAR point cloud is high resolution and covers
a long distance (for example a 64-lens laser acquires more than 1
million points per second). Dealing with such a large number of
points is very computationally expensive and will affect the real-
time performance of our pipeline.
In this study, a novel method of up sampling LIDAR range
inputs is employed to align depth with RGB images. In this method,
we compute dense depth maps just from the original range data
instead of using information from RGB images. We formulate up-
sampling using bilateral filtering formalism in our method to
Fig. 1: THE FRAMEWORK OF PRESENTED MODEL
generate the dense map D (output image) from a noisy and sparse-
depth image I. Assuming that input I is coordinated in pixel units
Autonomous vehicles use various sensors, such as LiDAR, and features calibration w.r.t. a high-resolution camera, pixel
radar, cameras, and ultrasonic sensors, to map and recognize the positions in I are nonintegers owing to the uncertainty of
environment surrounding the vehicle. While considering real-time calibration parameters and data sparsity. According to the intensity
performance, object detection, classification ability, and accurate value of a pixel p on the depth map, expressed as lower index ()P
distance estimation, LiDAR and vision fusion techniques have and its N neighborhood mask, the pixel value lies on the same
been introduced for object detection based on different levels of position of output map Dp , as shown in the following equation:
data fusion.
A Laser Imaging Detection and Ranging (LiDAR) sensor 1
D G I I G ‖p q‖ 1
has been used in this project to gather information regarding the W

environment. Differently from the camera, the LiDAR sensor
transmits laser pulses and measures the time it takes until a
reflection is received. Based on this, it calculates the distance of
the target and its 3D coordinates since the angles used to send the Where Gσr penalizes the influence of points q caused by their
laser pulse and the distance are known. range values, Gσs weighs inversely to the distance between
One of the largest and widely used benchmark datasets in position p and location q, and Wp functions as the normalized
the autonomous driving research community is the KITTI dataset factor, which ensures that the sum of weights are equal to one. In
and is used here which provides LiDAR point clouds, stereo color, (1), we set Gσs to be inversely proportional to the Euclidean
and grayscale pictures, and GPS coordinates. The data was distance (‖p q‖ between pixel position p and location q.
captured on the highways and rural areas of a mid-sized city in RGB images and the 3-D point clouds from KITTI are used
Germany called Karlsruhe. The tasks that can utilize this dataset as object benchmarks to classify objects, such as cars, pedestrians,
include 3D Object Detection, Visual Odometry, Stereo Matching, trucks, and cyclists. RGB color images are captured by the left
and Optical Flow. The Object Detection part of the dataset consists color video camera (10 Hz, resolution: 1392 × 512 pixels, opening:
of 7,481 training and 7,518 test images, with annotated boxes 90◦ × 35◦), whereas the 3-D point clouds are produced by a
around the objects of interest. Velodyne HDL-64E unit and projected back in image forms. As
The camera and LIDAR on the vehicle are used to collect one of the few available sensors that provide depth information,
images. We first capture the sparse-depth map by rotating Velodyne system can generate accurate 3-D data from moving
Velodyne laser-point cloud data from the KITTI database to the platforms. This system can also be applied in outdoor scenarios
RGB image plane using the calibration matrix. Then, we upsample and long sensing range compared with structured light systems
the sparse depth map to high-resolution depth image. We extract such as Microsoft Kinect.
four objects (pedestrian, cyclist, car, and truck) from each image Convolutional neural networks (CNN) have been
by considering the ground truth from KITTI. The rapid growth of extensively applied to image classification and computer vision,
research and commercial enterprises relating to autonomous and have returned 100% classification rates on datasets such as
robots, drones, humanoid robots, and AVs has established a high ImageNet. In CNN architecture, successive layers of neurons learn
demand for LiDAR sensors due to its performance attributes such progressively complex features in a supervised way by back-
as measurement range and accuracy, robustness to surrounding propagating classification errors, with the last layer representing
changes and high scanning speed. output image categories. CNNs do not use a distinct feature
We build three image datasets according to these objects. extraction module or a classification module, i.e. CNNs do not
One database is for the pure RGB image of the four kinds of object, have an unsupervised pre-training and the input representation is
one for the gray-scale image with gray level corresponding to implicitly through supervised training but eliminate the need for
actual distance information from LIDAR point clouds, and the manual feature description and feature extraction. It extracts
third one is an RGB-LIDAR image dataset consisting of the former features from raw data based on pixel values leading to final object
categories Each layer in the CNN finds successively complex
70 IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.11, November 2023

features where the first layer finds a small, simple feature


anywhere on the image, the second layer finds more complex
features and so on. At the last layer, these feature maps are
processed using fully connected neural networks (FCNN).
AlexNet is a leading architecture for any object-detection
task and may have huge applications in the computer vision sector
of artificial intelligence problems. In the future, AlexNet may be
adopted more than CNNs for image tasks. The Alexnet has eight
layers with learnable parameters. The model consists of five layers
with a combination of max pooling followed by 3 fully connected
layers and they use ReLu (Rectified Linear Units) activation in
each of these layers except the output layer.
For object classification, we classified images from KITTI
into cars, cyclists, pedestrians, and trucks. Then, we adopt
theAlexNet model as our CNN architecture. AlexNet comprises
five convolutional layers (named conv1–conv5) and three fully
connected layers (named as fc6, fc7, and fc8). Each convolutional Fig. 2: KITTI DATASET SCREEN
layer contains multiple kernels, and each kernel represents a 3-D
filter connected to the outputs of the previous layer. For fully
connected layers, each layer comprises multiple neurons, and each
neuron contains a positive value and is connected to all neurons in
the previous layer. We resize the captured images to 128 × 128
resolution for valid input and then passed them into AlexNet.
AlexNet is trained for 1000 classes.We change the size of fc8 layer
from 1000 to 4 to match our dataset with four classes. The
parameters from layer conv1 to layer fc6 are fixed to prevent
overfitting.
This RGB-LIDAR-based method notably improves average
precision on classification of four categories in the KITTI datasets.
We use the same dataset for training and testing models. Fig. 3: LOADED DATASET SCREEN

IV. Results Analysis


In this section the design and implementation results
of classification of objects using CNN based fusion of
vision and LIDAR in autonomous vehicle environment are
discussed.
To implement this project we have designed
following modules: i) Upload Kitti Dataset: using this
module we will upload dataset to application, ii)Load
Alexnet LIDAR CNN Model: This module will read all
images and then applying up-sampling to increase image
intensity and then extract RGB value to train ALEXNET
CNN model, iii) Run LIDAR Object Detection &
Classification: In this module we we will upload test image Fig. 4: TEST IMAGE UPLOADING SCREEN
and then Alexnet model will detect and classify objects
from that test image and iv) LIDAR Accuracy & Loss In above screen in red colour text we can see CNN model
Graph: Using this module we will plot LIDAR Alexnet loaded and now click on ‘Run LIDAR Object Detection &
accuracy and loss graph and we train this algorithm for 10 Classification’ button to upload test image and get below output.
EPOCH and in graph we will get accuracy plot for each
epoch.
Click on ‘Upload Kitti Dataset’ button to upload
dataset to application. In above screen selecting and In above screen, selecting and uploading 7.bmp im
uploading dataset folder and this folder contains different and then click on ‘Open’ button to get below output.
types of objects. The dataset has 3 different types of classes
and just go inside any folder to view those type of images.
The loaded dataset is shown in Fig. 3.
IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.11, November 2023 71

classification accuracy over the approach using only RGB


data or depth data. During the training phase, using LIDAR
information can accelerate feature learning and hasten the
convergence of CNN on the target task. We perform
experiments using the public dataset and display the
effectiveness and efficiency of the proposed approach.

References
[1] Hrag-Harout Jebamikyous, (Member, Ieee), And Rasha
Kashef, “Autonomous Vehicles Perception (AVP) Using
Deep Learning: Modeling, Assessment, and Challenges”,
IEEE ACCESS, VOLUME 10, 2022, doi:
Fig. 5: CLASSIFIED OBJECT IMAGE 10.1109/ACCESS.2022.3144407
[2] Javier Mendez, Miguel Molina, Noel Rodriguez, Manuel P.
now click on ‘LIDAR Accuracy & Loss Graph’ button to get below Cuellar and Diego P. Morales, “Camera-LiDAR Multi-Level
Sensor Fusion for Target Detection at the Network Edge”,
graph. Sensors 2021, 21, 3992, doi.org/10.3390/s21123992
[3] Mingyu Park, Hyeonseok Kim and Seongkeun Park, “A
Convolutional Neural Network-Based End-to-End Self-
. Driving Using LiDAR and Camera Fusion: Analysis
Perspectives in a Real-World Environment”, Electronics
2021, 10, 2608, doi.org/10.3390/electronics10212608
[4] Yulong Cao, Ningfei Wang, Chaowei Xiao, Dawei Yang, Jin
Fangz Ruigang Yangyy Qi Alfred Chen, Mingyan Liux Bo
Li, “Invisible for both Camera and LiDAR: Security of Multi-
Sensor Fusion based Perception in Autonomous Driving
Under Physical-World Attacks”, arXiv:2106.09249v1 [cs.CR]
17 Jun 2021
[5] G Ajay Kumar , Jin Hee Lee, Jongrak Hwang, Jaehyeong
Park, Sung Hoon Youn and Soon Kwon , “LiDAR and
Camera Fusion Approach for Object Distance Estimation in
Self-Driving Vehicles”, Symmetry 2020, 12, 324;
doi:10.3390/sym12020324
[6] You Li and Javier Ibanez-Guzman, “Lidar for Autonomous
Driving”, IEEE SIGNAL PROCESSING MAGAZINE,
6: FINAL OUTPUT GRAPH 1053-5888/20©2020IEEE, Digital Object Identifier
10.1109/MSP.2020.2973615
[7] Ajay Kumar, Jin Hee Lee, Jongrak Hwang, Jaehyeong Park,
Sung Hoon Youn and Soon Kwon, “LiDAR and Camera
Fusion Approach for Object Distance Estimation in Self‐
V. CONCLUSION Driving Vehicles”, Symmetry 2020, 12, 324;
In this work, classification of objects using CNN doi:10.3390/sym12020324
[8] Jian Nie, Jun Yan, Huilin Yin, Lei Ren, and Qian Meng, “A
based vision and LIDAR fusion in autonomous vehicle Multimodality Fusion Deep Neural Network and Safety Test
environment is presented. we propose a deep-learning- Strategy for Intelligent Vehicles”, 2020 IEEE Transactions
based approach by fusing vision and LIDAR data for object on Intelligent Vehicles
detection and classification in autonomous vehicle [9] Yulong Cao, Chaowei Xiao, Benjamin Cyr, Yimeng Zhou,
environment. On the one hand, we upsample point clouds Won Park, Sara Rampazzi, Qi Alfred Chen, Z. Morley Mao,
of LIDAR data and convert the upsampled point cloud data Kevin Fu, “Adversarial Sensor Attack on LiDAR-based
into pixel-level depth featuremap. On the other hand, we Perception in
convert the RGB together with depth feature map and then [10] Mhafuzul Islam, Mashrur Chowdhury, Hongda Li, and
fed the data into a CNN. On the basis of the integrated RGB Hongxin Hu, “Vision-Based Navigation of Autonomous
and depth data, we utilize DCNN to perform feature
learning from raw input information and obtain informative
feature representation to classify objects in the autonomous
vehicle environment. The proposed approach, in which
visual data are fused with LIDAR data, exhibits superior
72 IJCSNS International Journal of Computer Science and Network Security, VOL.23 No.11, November 2023

G.komali
M.Tech Scholar
CSE Department,
R.V.R & J.C College of Engineering,
Guntur, Andhra Pradesh, India.
[email protected]

Dr.A.Sri Nagesh
Professor,
CSE Department, R.V.R & J.C
College of Engineering,
Guntur, Andhra Pradesh, India.
[email protected]

You might also like