0% found this document useful (0 votes)
110 views

Universal Detection-Based Driving Assistance Using A Mono Camera With Jetson Devices

This document summarizes a research paper on developing a universal detection-based driving assistance system using a mono camera and NVIDIA Jetson edge devices. The proposed system modularizes the ADAS functions to make the system flexible to changes and improve stability. It implements common ADAS tasks like traffic light/sign detection and lane/vehicle detection using machine learning models run on the Jetson devices. Experimental results show the system achieves real-time performance suitable for safety-critical driving assistance on embedded platforms.

Uploaded by

Vinh Lê Hữu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Universal Detection-Based Driving Assistance Using A Mono Camera With Jetson Devices

This document summarizes a research paper on developing a universal detection-based driving assistance system using a mono camera and NVIDIA Jetson edge devices. The proposed system modularizes the ADAS functions to make the system flexible to changes and improve stability. It implements common ADAS tasks like traffic light/sign detection and lane/vehicle detection using machine learning models run on the Jetson devices. Experimental results show the system achieves real-time performance suitable for safety-critical driving assistance on embedded platforms.

Uploaded by

Vinh Lê Hữu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received May 9, 2022, accepted May 26, 2022, date of publication June 3, 2022, date of current version June

9, 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3179999

Universal Detection-Based Driving Assistance


Using a Mono Camera With Jetson Devices
DUONG NGUYEN-NGOC TRAN , (Graduate Student Member, IEEE),
LONG HOANG PHAM , (Member, IEEE), HUY-HUNG NGUYEN ,
TAI HUU-PHUONG TRAN , HYUNG-JOON JEON ,
AND JAE WOOK JEON , (Senior Member, IEEE)
Department of Electrical and Computer Engineering, Sungkyunkwang University, Suwon 16419, South Korea
Corresponding author: Jae Wook Jeon ([email protected])
This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea Government [Ministry of
Science and ICT (MSIT)] under Grant 2020R1A2C3011286.

ABSTRACT Advanced Driver Assistance Systems (ADAS) are a collection of intelligent solutions integrated
into next-generation vehicles to assist in safe driving. When building ADAS systems, the main goals are that
they are stable, flexible, easy to maintain, and allow for error tracing. If a driving assistance algorithm is
designed to be implemented on one machine or in one model, there is a potential disadvantage that if one
component fails, then the entire system would stop. We work on modularizing the ADAS system to be
flexible to accommodate any changes or improvements based on up-to-date requirements. Using advanced
current edge (or network) devices, we propose a Detection-based Driving Assistance algorithm, which can
collaborate or integrate with an existing system in a vehicle. The core of any process is to ensure that the
system has a predictable level of functionality and that any misbehavior can be easily traced to the root cause.
The proposed system shows fast, real-time performance on edge devices with limited computing power.

INDEX TERMS Advanced driver-assistance systems (ADAS), autonomous driving, scene understanding,
situational awareness, edge device.

I. INTRODUCTION The three primary human factors most frequently cited in


Advanced Driver Assistance Systems are intelligent systems the study are speeding, inattentiveness, and improper lookout.
located inside a vehicle that assist the human driver in various Most of them can be avoided with ADAS. The role of ADAS
ways. These systems present essential information about traf- is to prevent deaths and injuries by reducing the number of car
fic, closures, congestion of the roads ahead, congestion levels, accidents and reducing the severity of accidents that cannot be
and suggested routes that avoid congestion. These systems avoided. Essential safety-critical ADAS applications include
can also assess driver fatigue and distraction, and then pro- pedestrian and vehicle detection/avoidance, lane departure
vide precautionary warnings or assess driving performance warnings/corrections, and traffic light and traffic sign recog-
or make related recommendations. Advanced Driver Assis- nition. [3] shows that an ADAS system can have a crash
tance Systems have become critical technologies studied in avoidance effectiveness ranging from 9.3% to 33.3% for light
intelligent vehicles. vehicles [4], [5], while forward collision warnings (FCWs)
Most autonomous vehicle (AV) industry efforts focus on may be able to prevent 23% to 50% of light vehicle rear-end
advanced driver assistance systems since they are the first crashes [6]. Moreover, [7] summarizes the effectiveness of
step for fully self-driving cars. The reports [1], [2] show twenty ADAS technologies of both light vehicles and heavy
critical reasons for car accidents. They found that human trucks. These lifesaving systems are vital to ensuring the
factors are the primary reason or contributory element in 94% success of ADAS applications, incorporating the latest inter-
of car accidents. Vehicles, environmental factors, and other face standards, and running multiple vision-based algorithms
unknown reasons are responsible for 2% of crashes each. to support real-time multimedia, vision co-processing, and
sensor fusion subsystems.
The associate editor coordinating the review of this manuscript and These ADAS functions are usually based on one front
approving it for publication was Zhongyi Guo . camera or a front stereo-vision camera. Sometimes the
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
59400 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 10, 2022
D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

TABLE 1. Hardware comparison of Jetson modules with Titan X.

FIGURE 1. Structure of ADAS.

camera information is supplemented with information from


other sensors, like light detection and ranging (LIDAR) or
radio detection and ranging (RADAR). Against the front
windshield, ADAS cameras are typically located inside the
car behind the central rearview mirror. The ADAS camera
field of view is located in the wiper area to keep the glass in
front of the camera as clean as possible. Sometimes, RADAR
sensing, vision sensing, and data fusion are combined in a vehicle. If the vehicle process included sending and receiv-
single module. ing cautions from the server, it would be dependent on the
This paper proposes ADAS using information obtained connection among many vehicles. Therefore, it would be
from a mono color camera and cluster edge devices, as shown problematic if there is any problem with the connection.
in Fig. 1. The intelligent computation of ADAS is imple- We need the AI embedded and edge devices to isolable
mented using Machine Learning (ML) software that makes process the signal from the sensors on the vehicle. It should
decisions based on critical observations of objects in sur- be emphasized that the satisfied requirements for real-world
roundings. The cluster of edge devices operates the model, processing are in high demand, especially on embedded com-
including traffic lights and signs, road markings and lanes, putational devices or edge device. Manufacturing companies
vehicles and pedestrians, synchronization, and scenarios. provide various application-specific integrated circuits, such
Then, the final caution appears on display. as field-programmable gate arrays (FPGAs), digital signal
We experimented with optimizing performance in the test- processors (DSPs), or graphics processing units (GPUs).
ing stage using the Korean dataset provided by KATECH In this study, the proposed method has been implemented and
(Korea Automotive Technology Institute) [8], [9] by showing tested on an NVIDIA GPU-based computer and on NVIDIA
the time required for processing critical scenarios in ADAS. embedded computing platforms of the Jetson TX2 [10], Jet-
There are several key contributions of this work: son Xavier NX [11], and Jetson AGX Xavier [12]. The infor-
• We implement a modularized system that can flexibly mation for Jetson devices is shown in Table. 1.
implement any change and any improvement based on The structure of this paper is as follows. Section II presents
up-to-date requirements. the related works. Section III presents the system archi-
• The proposed thread-based approach demonstrably tecture and describes each stage of the system. Next, the
maintains stable operation because the crash or failure of experiments and results are presented in Section IV. Finally,
one thread-based module cannot affect the performance conclusions and directions for future work are discussed in
of other parallel modules. This approach is adopted Section V.
because any process that affects human safety must guar-
antee that the system has a predictable operational level, II. RELATED WORKS
and that any malfunctions can be effortlessly traced to In this section, we review common approach ADAS solu-
the root cause. tions and discuss their advantages and disadvantages. After
• The work shows very acceptable real-time performance describing the current boundaries, we demonstrate our strat-
on edge devices with limited computational power. egy to improve these limitations.
In terms of the physical world’s specific design of both
machines and technology, Operation Technology (OT) is the A. DRIVING ASSISTANCE ON ONE MACHINE
physical machines themselves and the systems that control, The procedure mainly focuses on building a model
monitor, and interface with them. In the OT, the Opera- that can work with many tasks while maintaining high
tional Level is the manufacturing operations management, accuracy. For further details, the model aims to learn
which manages production workflow. An autonomous car better representations through information shared among
must run continuously to capture any action outside the multiple tasks. For illustration, A CNN-based multi-task

VOLUME 10, 2022 59401


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

TABLE 2. Definition of classes in system. autonomous vehicles. To mitigate this expense, platform ven-
dors have sought to increase the value and functionality of the
GPU by using it to perform multiple workloads in the vehicle.
Virtualized GPUs have obvious applicability for autonomous
vehicles and ADAS scenarios, as a single GPU can power
multiple applications, from the visualization of maps and
operations of entertainment consoles to the processing of
environmental sensor data to identify roadway obstacles.
However, enabling multiple virtual operations from a single
GPU in automotive applications is only safe and effective if
the GPU has rock-solid support for hardware-accelerated vir-
tualization. Virtualization software is most dependable when
hardware enforces entirely separate managed address spaces
for each virtual instance and enables the restart, or flushing,
of a single instance that is not operating correctly. This work-
load isolation is key to allowing the shared use of the GPU
while keeping critical software, such as driver-assistance sys-
tems, from being corrupted by any other process.
ADAS running on one shared machine has many issues.
First, processing many tasks on one machine may lead to high
computation requirements. Also, when we upgrade each part
of ADAS, the whole system has to be updated. In a dangerous
scenario, if a single task fails, it may lead to the crash of
learning method mainly performs convolutional sharing of the whole system. Broadly speaking, it is hard to deploy a
the network structure. MultiNet [13] completes the three model that does not affect other models running on the same
scene perception tasks of scene classification, object detec- machine.
tion, and segmentation of the driving area simultaneously
by sharing an encoder and three independent decoders.
B. DEPLOYMENT ON EDGE COMPUTING
DLT-Net [14] inherits the encoder-decoder structure and
To keep satisfactory results without losing the performance of
contributively constructs context tensors between sub-task
any operation and processing with low computation require-
decoders to share designated information among tasks. More-
ments, we can use Jetson clusters. In [19], they mention the
over, [15] proposes one encoder for feature extraction and
embedded edge computing by deploying a deep model on
three decoders to handle the specific tasks. Meanwhile,
Jetson embedded boards. They compare the outcome with the
it proposes a novel loss function to constrain the lane line
computer simulation and get a comparable result with high-
to the outer contour of the lane area so that they will overlap
speed performance. For additional information, [20] notes
geometrically. More importantly, the training paradigm of a
that the core benefits of deploying the trained machine learn-
multi-task model also requires very careful consideration.
ing (ML) model on edge devices include: (1) The edge hard-
Reference [16] states that the joint training is appropriate
ware is more energy-efficient since it requires fewer energy
and beneficial only when all those tasks are indeed related;
resources than computer and server machines. (2) Edge-
otherwise, it is necessary to adopt alternating optimization
based inference hardware costs considerably less than other
strategies.
computational hardware such as field-programmable gate
However, in [17] and [18], the multi-task models have
arrays (FPGA) and GPUs. In our paper, we deploy and run
many issues: they require significantly more computational
all modules of the ADAS on the Jetson cluster to keep the
power to obtain higher performance and accuracy. Compared
benefit of edge computing while obtaining good outcomes.
with a single-task model, they achieve low accuracy because
many tasks using only one feature extractor. Broadly, it is
very difficult to simultaneously train many tasks and get a III. PROPOSED METHOD
better result, This may be because the tasks must be learned This section describes each module we use in the system,
at different rates or because one task may dominate the including traffic lights and signs, road markings and lanes,
learning leading to poor performance on other tasks. For vehicles and pedestrians, synchronization, and scenarios.
keeping good outcomes without losing the performance of Each module has a specific mission and provides impor-
any process, one solution is to apply an individual model tant information for driving assistance. For a more detailed
for each task. The level of performance demanded by ADAS description of objects we consider in this system, Table. 2
platforms will require increasingly larger and more powerful shows a list of the considered objects from the parent to child
GPUs, thus impacting the manufacturing bills of materials for leaves.

59402 VOLUME 10, 2022


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

FIGURE 2. The framework of the system.

FIGURE 3. Different stages of traffic lights, with or without arrow.


FIGURE 4. Display of ADAS.

A. PIPELINE OF SYSTEM
The overall details of the system pipeline are shown in Fig. 2. 3) After all the threads successfully return, the Synchro-
In the Main process, five threads have the job of keeping the nization Thread combines and sends all the information
system stable and running. with the same time frame to the scenarios module to
1) From the beginning, the frame from dashcam transfers investigate the situation and show the assistance infor-
to the Jetson AGX and is responsible for the primary mation. Finally, all the assistance information appears
process. The Signal thread’s task is to connect and in the display of the dashboard in a vehicle.
receive the buffer frame from a dashcam. After getting
the input frame, the synchronizing thread does the job B. TRAFFIC LIGHT AND SIGN MODULE
of cropping the image to a specific ratio, sending it to The signal module covers the detection work for traffic lights
the three threads for subsequent processing before wait- and traffic signs. We only use a one-stage detector to make
ing for the threads to finish. If the connection is down, sure it runs in real-time. We do not need a complex or large
it will reconnect again to keep the system working. model because the traffic light and traffic sign have a similar
2) The three threads, including the Upper Thread, Middle sample in most cases, and do not widely vary, like a pedestrian
Thread, and Lower Thread, correspond with the Light or car. The model we use is Scaled-YOLOv4 [21]. Likewise,
and Sign module, Vehicle and Pedestrian module, and we have used the TensorRT [22] inference optimizer and
Road marking and Lane modules, respectively. Each of runner for better optimization and further reduce the infer-
the three threads sends the suitably cropped frame to the ence time. We converted the scaled-YOLOv4 for model sim-
proper Jetson edge devices for processing and waits for plification and FP16 acceleration using TensorRT network
the information to return. If the information or result is definition APIs, which are based on an up-to-date version
not sent back in time, the thread will be terminated, and of the operating system of Jetson devices. For more detail,
a new thread will be created to and the input signal is the model is first implemented in PyTorch. We train and
resent. export a weights file from the model. Afterward, we define

VOLUME 10, 2022 59403


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

signals are on the left or right edges of the images, all of the
bulbs must be visible.
While in an intersection or roundabout, many traffic lights
are detected and recognized with different stages. The issue
is which one provides a signal meant for our car. We address
this issue and increase recognition performance by adjusting
the positions of ROIs based on individual image analysis.
By simplifying [24], we identify the region-of-interest (ROI)
containing the traffic light in each frame using the vehicular
pose and a prior traffic light pose. As shown in Fig. 4, the
order of precedence is green, blue, and red. In each ROI, the
priority is above to below.
n o
Bulight = bbox ulight,1 , bbox ulight,2 , . . . (2)
n o
which have the center point culight,1 , culight,2 , . . . with
n o
culight,i = xlight,c,i
u , yulight,c,i . We find the top priority of
FIGURE 5. Different types of traffic signs. The first row shows Traffic traffic light bounding box by these conditions: the bounding
Sign – Speed, the second to the seventh row show the Traffic sign – Else. box belongs to the upper region when culight,i ∈ bboxroi u and
j
roij ∈ {roil , roim , roir }.
(
the network using TensorRT, load the extracted weights file, roii > roij
bbox light,i > bbox light,j u
u u
and do inference tasks. The list of traffic lights and traffics ylight,i < yulight,j with roii = roij
sign classes are shown
 in Fig. 3 and Fig. 5. The input frame (3)
has the shape of wf , hf , cf which represent the width,
height, and number of channels, respectively. For this module, Therefore, the top priority traffic light, which is considered
we predefine three upper regions: the current one for the vehicle, is:
u
ROI u = {roil , roim , roir } (1) bboxlight,top = argmax Bulight (4)

with the corresponding factor ul , um , ur where ul + um + 2) TRAFFIC SIGN RECOGNITION


ur = 1. Each region has a height that is equal to that of Traffic signs have different structures and forms in different
the sent cropped image, and the width is calculated by wi = countries, the essential types of traffic signs are prohibitory,
ui ∗ wf . The priority of the region is defined by roim > danger, mandatory, and text-based signs. The prohibitory,
roir > roil , which means that the middle region is the most dangerous, or mandatory signs often have standard shapes,
important part, the second is on the right, and the final is on such as circles, triangles, and rectangles, and often have
the left. Each bounding box has a value that that matches that standard colors, such as red, blue, and yellow. The text-based
of Pascal VOC [23] format bbox = {xmin , ymin , xmax , ymax }. signs usually do not have fixed shapes and contain informa-
We use the three regions above to calculate the priority of tive text. Based on the KATECH dataset, we only consider
traffic lights and traffic signs. Using the priority, we ascertain the Traffic Sign Else class (including danger class, mandatory
the leading traffic light or traffic sign for the assisted vehicle. class, and prohibitory class) and the Traffic Sign Speed class:
In Fig. 4, the red, green, and blue bounding boxes represent • Traffic sign - Speed class: include the speed limited
the regions roil , roim , roir , respectively. Moreover, we notate in range. We take the localized sign from the detec-
that the point p is inside the bounding box, represented by the tion result and recognize and classify it (as shown in
formula: p ∈ bbox Fig. 5 - (a))
• Traffic sign - Else class: is designed to provide warnings

1) TRAFFIC LIGHT RECOGNITION vividly and instantly, including prohibitory or manda-


We only focus on non-occluded instances of traffic light tory restrictions (as shown in Fig. 5 - (b))
detection to reduce ambiguities. All occluded traffic lights are In the traffic sign part, we only consider the main traffic sign
removed from the training set and validation set to achieve speed for the vehicle because it is used for the post process
this goal. In some cases, the deep learning networks still in The Scenario Module. After detection, we have the list of
detect traffic lights on the boundaries of images. Our system bounding boxes
uses two policies to decide whether or not a considered traffic
n o
Buspeed = bbox uspeed,1 , bbox uspeed,2 , . . . (5)
light is irrelevant: In the case of the top boundary, more than
Belse = bbox else,1 , bbox else,2 , . . .
u u u

half of the traffic light bulbs must be visible. When candidate (6)

59404 VOLUME 10, 2022


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

D. ROAD SURFACE MARKING AND LANE MODULE


1) LANE DETECTION
Reference [27] indicates that the lane detection algorithm
must ensure good reliability, real-time operation, and robust-
ness to meet practical requirements. With the development of
autonomous driving technology, we need a lot of actual tests
on the road. At the same time, a considerable overhead of
FIGURE 6. Visualization of lane detection.
resources will be consumed, and there are particular dangers
to slow or erroneous computation.
We only keep 12 image from below to extract the result, and
which have the center point {cuspeed,1 , cuspeed,2 , . . .} with
we use Ultra-Fast Structure-aware Deep Lane Detection [28]
cuspeed,i = {xspeed,c,i
u , yuspeed,c,i }. We find the top priority of the to get the lane. We get the list of lanes
traffic light bounding box by these conditions: the bounding
box belongs to the region if cuspeed,i in bbox uroij and roij ∈ L = {l1 , l2 , . . .} (10)
{roim , roir }
in which li = p1 , p2 , . . . . We find the ego lane to determine
 i i
( the main road marking in the post process. Ego-lane detection
roii > roij
bbox uspeed,i > bbox uspeed,j u detects the current lane and its boundary and is mainly applied
yspeed,i < yuspeed,j with roii = roij online so that autonomously driving cars can stay in the cur-
(7) rent lane with the aid of lane departure detection. The white
segment is drawn in Fig. 6, and the purple cover is the regular
Therefore, the top priority traffic sign - speed, which is con- lane. To find the ego-lane, we find two lines nearest the screen
sidered as the current bounding speed of the vehicle, is: centerline (the orange line in Fig. 4) on the left and right. The
variable for the screen centerline:
bbox uspeed, top = argmax Buspeed (8)
lsc (x) = msc × x + csc (11)
Because traffic sign – speed contains only digits, we use
LPRNet [25] for fast Optical Character Recognition (OCR) The position of a point is
and get the main speed limit value from the traffic

< 0 on the left

sign – speed, which is
pospi = lsc (xi ) = 0 on the cente (12)
> 0 on the right
  

OCR bbox uspeed, top (9)
The number of points on the left is:
n
C. VEHICLE AND PEDESTRIAN MODULE X
nump l = pospi < 0 (13)
Automotive standards need to be followed for any system i
to obtain enhanced stability, predictability, and reliability.
The most important priority is that the ADAS be safe and and on the right is:
secure. Since the misbehavior of systems in a vehicle may n
X
result in hazardous situations to the passengers and other nump r = pospi > 0 (14)
vehicles or pedestrians on the road, care should be taken i
to ensure system reliability. We use the TensorRT version
To find whether the line is on the left or right of the screen
of scaled-YOLOv4 to detect pedestrians and vehicles in the
centerline, we use:
detection, which the same version as one in traffic light and (
sign module. We simplify the model in the same process left if nump l > nump r
as the method using in the Traffic Light and Sign module. posli (15)
right if nump l > nump r
The processing takes on 32 of an image from the bottom
up. Based on [26], reducing the computational complexity To find the nearest distance, we use the Hausdorff distance:
reduces the search space instead of limiting the window scale
and position. Let us suppose that unnecessary portions of dH (li , lsc )
    
the image, including the image background and the areas of
 
= max max min d p ,p
i sc
, max min d p , p
i sc
the scene where objects of interest are not expected, can be sc
pi ∈li p ∈lsc sc p ∈lsc pi ∈li
excluded from the search space. In that case, there will be (16)
considerable savings in computational cost. After detection,
The nearest distance on the left is argmin d H li,left , lsc ,

we get the set of bounding boxes: Bm m m m
car , Bbus , Btruck , Bmotor ,
Bmpedestrian , and Bm .
bike
and the nearest distance on the right is argmin dH li,right , lsc .

VOLUME 10, 2022 59405


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

FIGURE 7. Different types of road marks.

The ego lane is


n o
Lego = largmin dH (li,left ,lsc ) , largmin dH (li,right ,lsc )
= lleft,sc , lright,sc

(17)

2) ROAD MARKING RECOGNITION


Based on [29], to avoid the conflict of arrows in traffic
lights with information on the road, we only consider five FIGURE 8. Different types of road mark arrows.

main directions of a traffic arrow, including Straight, Left,


Right, Straight Left, Straight Right (as shown in Fig. 8).
In addition, we add two more. These are U-turn and Else.
l
bboxcharacter,i > bboxcharacter,j
l

Traffic arrow – Else means that other directions are different where ylcharacter,c,i > ylcharacter,c,j (22)
from the above arrows. Moreover, we consider the other
road mark, such as crosswalk, number, and character (as Therefore, the top priority is given by the pair:
shown in Fig. 7). The ‘‘Road mark - Number’’ shows the l l
bboxnumber, top = max Bnumber,ego (23)
speed limit of the current lane. Using the lane informa-
l
tion and the number given by OCR, the system can show bboxcharacter, top = max Blcharacter,ego (24)
the speed needed for this lane. After getting the detection,
we get the detections: Bunumber , nBucharacter , and Bucrosswalko We next consider the instruction from ‘‘Road mark - Charac-
ter.’’ The letters and words from each country are different,
which have the center point clnumber,1 , clnumber,2 , . . . so using the individual modules can help change, we choose
n o
with clnumber,i = l
xnumber,c,i , ylnumber,c,i , and the to show the closet Character on the panel.
n o
point clcharacter,1 , clcharacter,2 , . . . with clcharacter,i =
n o E. SYNCHRONIZATION MODULE
xcharacter,c,i , ycharacter,c,i . We only consider the post-
l l
Elaborating on the content of Fig. 2, we implement our
process for Number and Character Road Markings. We filter proposed solution from the video input in a multi-threaded
these bounding boxes and keep processes belonging to the CPU-, GPU-utilizing manner. As the four modules mentioned
ego lane, Lego , by finding the center of the bounding box of above are mostly independent of one another, we take advan-
the polygon created by two lines of the ego lane. The list of tage of the current hardware development for Internet of
points of the polygon is: Things (IoT) devices with both multi-core CPU and CUDA-
n o compatible GPU support and propose a thread-level paral-
Pego = plleft,sc,1 , plleft,sc,2 , . . . , plright,sc,1 , pleight,sc,1 , . . . lelism framework [30]. Our approach best performs with at
(18) least five threads and a GPU, where each thread processes
inputs continuously throughout the given frame sequence.
The list of bounding boxes that remain after filtering are: From the primary device (Jetson AGX), we deploy Synchro-
n o nization Threads for receiving and sending signals for all
Blnumber,ego = bboxnumber
u l
cnumber ∈ Pego (19)
n o necessary components of the solution (other Jetson devices)
Blcharacter,ego = bboxcharacter
u l
ccharacter ∈ Pego (20) and deploy the other four threads for continuous processing.
• Thread #1 (Signal Thread): This thread is responsible
We find the top priority of the Number and Character by these for receiving the input frame. This thread has to make
conditions: sure that the connection with the camera is stable and
online. If there is an error in the connection or buffer
l
bboxnumber,i > bboxnumber,j
l
frame from the camera, this thread reconnects and waits
where ylnumber,c,i > ylnumber,c,j (21) for the signal.

59406 VOLUME 10, 2022


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

• Thread #2 (Synchronization Thread): This thread is them, it shows the lowest and closest word in the ego
responsible for sending and receiving the crop frame and lane.
results from other threads. It is the most crucial thread • Traffic light panel: This panel shows the current traffic
because it can terminate and create a new thread. The light. The current light governing the vehicle is the result
thread must keep the time frame in order and the results of the traffic light module process.
corresponding with this time frame. After receiving the • Speed limit panel: The panel shows the speed limit.
input frame, the thread crops it and sends it to the other The default value is 60 kph, which is the speed limit for
threads, and then it waits for the results. After getting a vehicles on most city streets and rural two-lane roads in
result back, the thread packs it with the corresponding Korea. The value is changed based on the Traffic Light
time frame and sends it to the Scenarios module in the and Sign module and Road Marking and Lane module.
Jetson TX2. If one of the modules does not send the We show the speed limit with kilometer per hour (km/h)
result on time, the thread calls the primary process to units.
terminate the non-responsive thread and creates a new • Caution panel: The panel shows the assistance message
one. from the ADAS. There are three main caution mes-
• Thread #3 (Upper Thread): This thread is responsible sages: NORMAL, WARNING, DANGER. The NOR-
for sending and receiving the Light and Sign module MAL indicator means that the driving condition is safe.
result from the Jetson NX. The edge device performs The WARNING indicator means that the driver must
the object detection for the traffic light and traffic sign consider slowing down and pay more attention to obsta-
by running the traffic light and sign module (mentioned cles or objects on the road. The DANGER indicator
above). Finally, the result is sent back to the edge device means that the driver should be ready to brake, and
primary process, and the upper thread handles it and potential collisions and danger lie in front of the car. The
sends it to the Synchronization Thread. danger signal indicates danger for the driver, pedestri-
• Thread #4 (Middle Thread): This thread is responsible ans, or other drivers.
for sending and receiving the result from the Vehicle We prioritize caution: the first priority is to treat the pedes-
and Pedestrian module from the Jetson AGX. The edge trian with caution and the second is to treat the vehicle with
device performs object detection for cars, buses, trucks, caution. The traffic light and traffic sign serve as supporting
and pedestrians by running the vehicle and pedestrian pieces of information that encourage driver caution.
module (mentioned above). Finally, the result is sent
back to the edge device primary process, and the middle 1) PEDESTRIAN CAUTION
thread handles it and sends it to the Synchronization The scenarios module always initiates before the detection of
Thread. any case of a pedestrian. For many cases, the caution signal
• Thread #5 (Lower Thread): This thread is responsi- regarding a pedestrian is stop. For example, such cases could
ble for sending and receiving the result of the Road be the pedestrian in the crosswalk in front of the car, or the
Marking and Lane module from the Jetson AGX. The pedestrian in any lane (especially, ego lane.). The way to
edge device performs both lane detection and road determine the pedestrian state is by using Intersection over
marking detection. Finally, the result is sent back Union (IoU) for object detection. The equations to check the
to the edge device primary process, and the lower pedestrian’s state in front of a car are
thread handles it and sends it to the Synchronization  
Thread. m
IoU bboxpedestrian,i , bboxcrosswalk,j
l
In case of adding a new module in the future, a new thread can
be added to the current threads, and the Synchronize Thread m l
= bboxpedestrian,i ∩ bboxcrosswalk,j (25)
would handle it.  
m
IoU bboxbike,i , bboxcrosswalk,j
l
F. SCENARIOS MODULE
m l
This section shows the work of the scenarios module on the = bboxbike,i ∩ bboxcrosswalk,j (26)
Jetson TX2. The module processes all the information and  
displays it along with the information and the assistance sig- m
IoU bboxpedestrian,i , Lego
nal [31], [32]. The scenarios can be updated and customized
m
according to the regulations of each country. From Fig. 4, the = bboxpedestrian,i ∩ Lego (27)
display of ADAS has five components, and each of them has m
, Lego

IoU bboxbike,i
the result from this module.
• Direction panel: The panel shows the current direction m
= bboxbike,i ∩ Lego (28)
of the lane based on the default or the road marking
arrow on the ego lane (shown in Fig. 8). For example, in the case of Fig. 9 - (a)(b), the pedestrians go
• Road marking character panel: The panel shows any over the crosswalk, (25) has a value greater than zero. In the
words or characters in the ego lane. If we have many of case of Fig. 9 - (c), the bike goes over the ego lane, and the

VOLUME 10, 2022 59407


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

FIGURE 9. Specific cases. (a)-(c) The pedestrians are in crosswalks. (d) The pedestrian is in the ego-lane. (e) A motorbike is in front of the
vehicle (f)-(h) Multiple and different types of traffic lights. (i) The motorbike is parked on the pavement. (k)-(l) The bus suddenly changes lanes.

value of (26) is greater than zero. In the case of Fig. 9 - (d), IV. EXPERIMENT RESULT
the bike goes over the ego lane, and the value (27) is greater In this section, we evaluate the performance in analyzing
than zero. If one of these IoUs has a value greater than zero, some scenarios while driving to show the effectiveness of the
the caution is DANGER. If the IoU with the lane (not ego- proposed ADAS. In addition to showing caution messages in
lane, (29) and (30) are still zero): the display, we show the speed performance of the selected
  model running on Jetson devices.
m
IoU bboxpedestrian,i , L = bboxpedestrian,i
m
∩ L (29)
m
, L = bboxbike,i
m

IoU bboxbike,i ∩L (30)
A. EXPERIMENTAL SETTING
If the case (29) has a value more than zero (as shown in The experiment using the KATECH dataset contains more
Fig. 9 - (j)), the caution is WARNING. than 160,000 images for the training, validation, and test-
ing sets. The image resolutions in the dataset are 1280 ×
2) VEHICLE CAUTION 720 pixels, 1280 × 672 pixels (1280), or 1920 × 1080 pixels.
For vehicle caution, the top priority is motorbike, car, bus, The times of the captured images are daytime, dawn, and
and the last is truck. We have a union of detection in vehicle nighttime. Moreover, the dataset includes sunny, overcast,
detection. and rainy weather. We set up three Jetson AGX Xavier
devices, one Jetson Xavier NX device, and one Jetson TX2
Bm m m m m
vehicle = Bmotor ∪ Bcar ∪ Bbus ∪ Btruck (31) device. The input resolution we used was 1280, and the output
is displayed at the end. We labeled 10000 images for the
The equation to check the position of the vehicle is: scenario test case. The label is based on four panels in the
ym final display. Six direction types are labeled for the Direction
vehicle,i
posvehicle,i = (32) panel, including Straight, Left, Right, Straight-Left, Straight-
hf
Right, and U-turn. For the Traffic Light panel, the varieties are
The caution is WARNING whenever any vehicle on the ego shown in Fig. 3, including Red, Yellow, Green, Red Arrow,
lane in more than 12 hf (Fig. 9 - (k)). And the caution is and Green Arrow. For the Speed Limit panel, we labeled it in
DANGER whenever any vehicle on the ego lane in more than the range of 30 to 150 kph. For the Caution Message panel,
2
3 hf (as shown in Fig. 9 - (l)). we labeled for Driving messages (including Safety, Warning,

59408 VOLUME 10, 2022


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

FIGURE 10. Visualization of scenarios in the series of frames.

Danger) and Icon message (including Vehicle, Cycle, and In the case of Fig. 10 - (a), the pedestrian is crossing
Pedestrian). the street without observation. The ADAS detects and alerts
the driver about the pedestrian in the crosswalk with the
B. CASE SCENARIOS WARNING signal. In the third image, the vehicle is near the
Fig. 10 shows some scenarios on the KATECH dataset that walker, so the ADAS alerts DANGER and asks the driver to
we have highlighted: be ready to stop. In the last image, while the pedestrian is on

VOLUME 10, 2022 59409


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

TABLE 3. Accuracy of panels. TABLE 5. Speed performance of the system.

TABLE 4. Accuracy (mAP) comparison of traffic object detection.


TABLE 6. Comparison of devices power consumption.

Finally, the system’s accuracy on the scenario case is


shown in Table. 3. Most of the number is higher than 90,
proving that the system runs well in most cases.

C. PERFORMANCE
Besides the accuracy in the case scenario, we measure the
accuracy in traffic object detection among our system with
the one multi-task model, as shown in Table. 4. As can be
seen, the DLT-Net and YOLOP have lower mAP than Ours
because both models have to share the feature extractor with
other tasks, and the training step for the multi-task model is
more complicated than the specialized model.
Because these algorithms run in independent modules on
the sidewalk, the caution panel returns to WARNING, and the hardware with limited power, the real-time performance of
vehicle continues to drive without the alert DANGER. the system must be maintained. We find that the proposed sys-
In the case of Fig. 10 - (b), the vehicle can turn left with tem satisfies real-time reactions to the outside environment.
a speed limit of 50 km/h. However, when the left turn is Table. 5 shows the processing time of each case while using or
ongoing, the ADAS detects pedestrians on the crosswalk in neglecting to use each module in the system. In the meantime,
front, the WARNING caution is displayed, and the vehicle the Table. 6 shows the recommended system power for each
has to slow down. Then, with the crosswalk near the car, the case. Therefore, in the case of running all modules in one
caution DANGER is displayed, and it waits for all pedestrians machine with one or three Titan X GPUs, the energy required
to cross the street, then it turns to the NORMAL indicator. for it is higher than for the group of Jetson devices. Addition-
In Fig. 10 - (c), the ADAS detects one motorbike near the ally, the stack of models may demand more memory than one
vehicle and in the ego-lane and the WARNING caution is Titan X’s available memory, which means we need more than
shown. Then, the NORMAL indicator is displayed when the one GPU and more power consumption in one system.
motorbike is at a safe distance from the vehicle.
In Fig. 10 - (d), the WARNING caution alerts the driver that V. CONCLUSION AND FUTUREWORK
the red light is on, and that the vehicle must slow down and This paper proposes a modular system that can flexibly
stop. Nevertheless, the NORMAL indicator allows vehicle implement any changes or improvements based on updated
to safely run while reaching the intersection when the green requirements. Herein, experiments show that the proposed
light is on. ADAS maintains stability and that a crash in one module
In Fig. 10 - (e), a dangerous situation is presented because cannot affect the performance of others. The core tenet of
the bus changes lanes too fast to reach the bus stop. The any process that affects human safety is to ensure that the
ADAS alerts DANGER for the driver to be ready to stop. The system has a predictable level of performance and that any
caution indicator returns to NORMAL again after the bus is misbehavior can be easily traced to the root cause. The
at the bus stop. work shows good execution speed with proper timing on

59410 VOLUME 10, 2022


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

edge devices. In the future, we will upgrade the module to [21] C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, ‘‘Scaled-YOLOv4:
analyze more traffic rules and attempt to allow the ADAS to Scaling cross stage partial network,’’ in Proc. IEEE/CVF Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 13029–13038.
transact with the physical driving system, by implementing [22] NVIDIA Tensorrt for Developers. Accessed: Jan. 18, 2022. [Online].
emergency braking, for instance, to improve the level of the Available: https://developer.nvidia.com/tensorrt
autonomous vehicle system. [23] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman,
‘‘The Pascal visual object classes (VOC) challenge,’’ Int. J. Comput. Vis.,
vol. 88, no. 2, pp. 303–338, Jun. 2010.
REFERENCES [24] A. Avramovic, D. Sluga, D. Tabernik, D. Skocaj, V. Stojnic, and N. Ilc,
‘‘Neural-network-based traffic sign detection and recognition in high-
[1] L. Dieter, ‘‘Saving lives: Boosting car safety in the EU,’’ KOCH, Eur.
definition images using region focusing and parallelization,’’ IEEE Access,
Parliament, Strasbourg, France, Tech. Rep. 2017/2085(INI), Nov. 2017.
vol. 8, pp. 189855–189868, 2020.
[2] T. Stewart, ‘‘Overview of motor vehicle crashes in 2020,’’ U.S. Dept.
[25] S. Zherzdev and A. Gruzdev, ‘‘LPRNet: License plate recognition via deep
Transp. Nat. Highway Traffic Saf. Admin., NHTSA Tech. Rep. DOT HS
neural networks,’’ Jun. 2018, arXiv:1806.10447.
813 266, Mar. 2022.
[26] S. Lee, M. Younis, A. Murali, and M. Lee, ‘‘Dynamic local vehicular flow
[3] L. Yue, M. A. Abdel-Aty, Y. Wu, and A. Farid, ‘‘The practical effective-
optimization using real-time traffic conditions at multiple road intersec-
ness of advanced driver assistance systems at different roadway facilities:
tions,’’ IEEE Access, vol. 7, pp. 28137–28157, 2019.
System limitation, adoption, and usage,’’ IEEE Trans. Intell. Transp. Syst.,
[27] N. Ma, G. Pang, X. Shi, and Y. Zhai, ‘‘An all-weather lane detection
vol. 21, no. 9, pp. 3859–3870, Sep. 2020.
system based on simulation interaction platform,’’ IEEE Access, vol. 8,
[4] E. Nodine, A. Lam, S. Stevens, M. Razo, and W. Najm, ‘‘Integrated pp. 46121–46130, 2020.
vehicle-based safety systems (IVBSS) light vehicle field operational test [28] Z. Qin, H. Wang, and X. Li, ‘‘Ultra fast structure-aware deep
independent evaluation,’’ United States Nat. Highway Traffic Saf. Admin., lane detection,’’ in Computer Vision—ECCV, vol. 12369. Cham,
Tech. Rep. DOT-VNTSC-NHTSA-11-02; DOT HS 811 516, Oct. 2011. Switzerland: Springer, 2020, pp. 276–291.
[5] T. Gordon, H. Sardar, D. Blower, M. L. Aust, Z. Bareket, M. Barnes, [29] Y. Qian, J. M. Dolan, and M. Yang, ‘‘DLT-Net: Joint detection of drivable
A. Blankespoor, I. Isaksson-Hellman, J. Ivarsson, B. Juhas, K. Nobukawa, areas, lane lines, and traffic objects,’’ IEEE Trans. Intell. Transp. Syst.,
and H. Theander, ‘‘Advanced crash avoidance technologies (ACAT) vol. 21, no. 11, pp. 4670–4679, Nov. 2020.
program—Final report of the Volvo-Ford-UMTRI project: Safety impact [30] D. N.-N. Tran, L. H. Pham, H.-H. Nguyen, T. H.-P. Tran, H.-J. Jeon,
methodology for lane departure warning—Method development and esti- and J. W. Jeon, ‘‘A region-and-trajectory movement matching for multiple
mation of benefits,’’ United States Nat. Highway Traffic Saf. Admin., turn-counts at road intersection on edge device,’’ in Proc. IEEE/CVF Conf.
Washington, DC, USA, Tech. Rep. DOT HS 811 405, 2010. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Nashville, TN, USA,
[6] J. B. Cicchino, ‘‘Effectiveness of forward collision warning and Jun. 2021, pp. 4082–4089.
autonomous emergency braking systems in reducing front-to-rear crash [31] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, ‘‘A survey of
rates,’’ Accident Anal. Prevention, vol. 99, pp. 142–152, Feb. 2017. autonomous driving: Common practices and emerging technologies,’’
[7] L. Yue, M. Abdel-Aty, Y. Wu, and L. Wang, ‘‘Assessment of the safety IEEE Access, vol. 8, pp. 58443–58469, 2020.
benefits of vehicles’ advanced driver assistance, connectivity and low [32] S. Jeon, J. Son, M. Park, B. S. Ko, and S. H. Son, ‘‘Driving-PASS: A driving
level automation systems,’’ Accident Anal. Prevention, vol. 117, pp. 55–64, performance assessment system for stroke drivers using deep features,’’
Aug. 2018. IEEE Access, vol. 9, pp. 21627–21641, 2021.
[8] Korea Automotive Technology Institute. Leading a Future With Cre-
ativity & Innovation. Accessed: Jan. 4, 2022. [Online]. Available:
http://www.katech.re.kr/eng
[9] D. N.-N. Tran, H.-H. Nguyen, L. H. Pham, and J. W. Jeon, ‘‘Object detec-
tion with deep learning on drive PX2,’’ in Proc. IEEE Int. Conf. Consum.
Electron.-Asia (ICCE-Asia), Seoul, South Korea Nov. 2020, pp. 1–4.
[10] Jetson TX2 Module. Accessed: Jan. 17, 2022. [Online]. Available:
https://developer.nvidia.com/embedded/jetson-tx2
[11] Jetson Xavier NX Developer Kit. Accessed: Jan. 17, 2022. [Online]. Avail- DUONG NGUYEN-NGOC TRAN (Graduate Stu-
able: https://developer.nvidia.com/embedded/jetson-xavier-nx-devkit dent Member, IEEE) received the B.S. degree in
[12] Jetson AGX Xavier Developer Kit. Accessed: Jan. 17, 2022. [Online]. computer science and the M.S. degree in infor-
Available: https://developer.nvidia.com/embedded/jetson-agxxavier- mation technology management from Interna-
developer-kit tional University, Ho Chi Minh City, Vietnam,
[13] M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, in 2014 and 2018, respectively. He is currently
‘‘MultiNet: Real-time joint semantic reasoning for autonomous driving,’’ pursuing the Ph.D. degree in electrical and com-
May 2016, arXiv:1612.07695. puter engineering with Sungkyunkwan University,
[14] Z. Kang, K. Grauman, and F. Sha, ‘‘Learning with whom to share in multi- Suwon, South Korea. His current research inter-
task feature learning,’’ in Proc. 28th Int. Conf. Int. Conf. Mach. Learn., ests include computer vision, image processing,
Madison, WI, USA, 2011, pp. 521–528. and deep learning.
[15] D. Wu, M. Liao, W. Zhang, X. Wang, X. Bai, W. Cheng, and W. Liu,
‘‘YOLOP: You only look once for panoptic driving perception,’’ 2021,
arXiv:2108.11250.
[16] N. K. Ragesh and R. Rajesh, ‘‘Pedestrian detection in automotive safety:
Understanding state-of-the-art,’’ IEEE Access, vol. 7, pp. 47864–47890,
2019.
[17] C. Fifty, E. Amid, Z. Zhao, T. Yu, R. Anil, and C. Finn, ‘‘Efficiently
identifying task groupings for multi-task learning,’’ in Proc. Adv. Neural
LONG HOANG PHAM (Member, IEEE) received
Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 27503–27516.
[18] T. Standley, A. R. Zamir, D. Chen, L. Guibas, J. Malik, and S. Savarese,
the B.S. degree in computer science and the M.S.
‘‘Which tasks should be learned together in multi-task learning?’’ in Proc. degree in information technology management
37th Int. Conf. Mach. Learn. (ICML), vol. 119, Jul. 2020, pp. 9120–9132. from International University, Ho Chi Minh City,
[19] S. Ullah and D.-H. Kim, ‘‘Federated learning using sparse-adaptive Vietnam, in 2013 and 2017, respectively, and the
model selection for embedded edge computing,’’ IEEE Access, vol. 9, Ph.D. degree in electrical and computer engi-
pp. 167868–167879, 2021. neering from Sungkyunkwan University, Suwon,
[20] M. A. Farooq, P. Corcoran, C. Rotariu, and W. Shariff, ‘‘Object detection in South Korea, in 2021. His current research inter-
thermal spectrum for advanced driver-assistance systems (ADAS),’’ IEEE ests include computer vision, image processing,
Access, vol. 9, pp. 156465–156481, 2021. and deep learning.

VOLUME 10, 2022 59411


D. N.-N. Tran et al.: Universal Detection-Based Driving Assistance Using Mono Camera With Jetson Devices

HUY-HUNG NGUYEN received the B.S. degree HYUNG-JOON JEON received the B.S. degree
in computer science and the M.E. degree in infor- in computer science and engineering from Yonsei
mation technology management from Interna- University, Seoul, South Korea, in 2013, and the
tional University—Vietnam National University, M.S. degree in electrical and computer engi-
Vietnam, in 2014 and 2017, respectively. He is cur- neering from Sungkyunkwan University, Suwon,
rently pursuing the Ph.D. degree in electrical and South Korea, in 2019, where he is currently pur-
computer engineering with Sungkyunkwan Uni- suing the Ph.D. degree in electrical and com-
versity, Suwon, South Korea. His research inter- puter engineering. From 2013 to 2016, he was an
ests include computer vision, image processing, Engineer with Samsung Electronics, Suwon. His
and deep learning. research interests include computer vision, deep
learning, and system software.

JAE WOOK JEON (Senior Member, IEEE)


received the B.S. and M.S. degrees in electron-
ics engineering from Seoul National University,
Seoul, South Korea, in 1984 and 1986, respec-
tively, and the Ph.D. degree in electrical engi-
TAI HUU-PHUONG TRAN received the B.S. neering from Purdue University, West Lafayette,
degree in computer science and engineering from IN, USA, in 1990. From 1990 to 1994, he was
International University—Vietnam National Uni- a Senior Researcher with Samsung Electronics,
versity, Vietnam, in 2015. He is currently pursuing Suwon, South Korea. Since 1994, he has been with
the Ph.D. degree in electrical and computer engi- Sungkyunkwan University, Suwon, where he was
neering with Sungkyunkwan University, Suwon, first an Assistant Professor with the School of Electrical and Computer
South Korea. His current research interests include Engineering and is currently a Professor with the School of Information and
image processing, computer vision, and deep Communication Engineering. His current research interests include robotics,
learning. embedded systems, and factory automation.

59412 VOLUME 10, 2022

You might also like