DownloadsDriver Behavior Analysis Based on Real On-Road Driving Data in the Design of Advanced Driving Assistance Systems Go to Driver Behavior Analysis Based on Real On-Road Driving Data
DownloadsDriver Behavior Analysis Based on Real On-Road Driving Data in the Design of Advanced Driving Assistance Systems Go to Driver Behavior Analysis Based on Real On-Road Driving Data
Scholarship@Western
11-15-2022 3:30 PM
Recommended Citation
Khairdoost, Nima, "Driver Behavior Analysis Based on Real On-Road Driving Data in the Design of
Advanced Driving Assistance Systems" (2022). Electronic Thesis and Dissertation Repository. 9088.
https://ir.lib.uwo.ca/etd/9088
This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted
for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of
Scholarship@Western. For more information, please contact [email protected].
Abstract
The number of vehicles on the roads increases every day. According to the
National Highway Traffic Safety Administration (NHTSA), the overwhelming
majority of serious crashes (over 94 percent) are caused by human error. The
broad aim of this research is to develop a driver behavior model using real on-
road data in the design of Advanced Driving Assistance Systems (ADASs). For
several decades, these systems have been a focus of many researchers and vehi-
cle manufacturers in order to increase vehicle and road safety and assist drivers
in different driving situations. Some studies have concentrated on drivers as
the main actor in most driving circumstances. The way a driver monitors
the traffic environment partially indicates the level of driver awareness. As
an objective, we carry out a quantitative and qualitative analysis of driver
behavior to identify the relationship between a driver’s intention and his/her
actions. The RoadLAB project developed an instrumented vehicle equipped
with On-Board Diagnostic systems (OBD-II), a stereo imaging system, and a
non-contact eye tracker system to record some synchronized driving data of
the driver cephalo-ocular behavior, the vehicle itself, and traffic environment.
We analyze several behavioral features of the drivers to realize the potential
relevant relationship between driver behavior and the anticipation of the next
driver maneuver as well as to reach a better understanding of driver behavior
while in the act of driving. Moreover, we detect and classify road lanes in
the urban and suburban areas as they provide contextual information. Our
experimental results show that our proposed models reached the F1 score of
84% and the accuracy of 94% for driver maneuver prediction and lane type
classification respectively.
ii
Summary for Lay Audience
The large number of vehicle collisions leads to both tremendous human
and economic costs. Road traffic injury is the leading cause of death among
young people and children aged 5-29 years and makes road fatalities the eighth
leading cause of death across all age groups. Evidence has shown that a
significant number of vehicle accidents are due to driver error. The broad aim
of this research is to develop a driver behavior model using real on-road data
in the design of Advanced Driving Assistance Systems (ADASs). In many
driving situations, drivers may receive an alert from their passengers to avoid
an accident with another vehicle or a pedestrian. This role can be played by
an intelligent ADAS by warning the driver or even intervening if ADAS finds
it necessary. An intelligent ADAS can understand and benefit from valuable
information including the state of the driver’s behavior, the vehicle, and the
environment to analyze driver behavior in different driving situations as well
as to predict driver maneuvers. We analyze several behavioral features of the
drivers to realize the potential relevant relationship between driver behavior
and the anticipation of the next driver maneuver as well as to reach a better
understanding of driver behavior while in the act of driving.
iii
List of Acronyms
2D - two-dimensional
3D - three-dimensional
ABS - Anti-lock Braking System
ACC - Adaptive Cruise Control
ACF - Aggregated Channel Features
ADAS - Advanced Driving Assistance System
AIDS - Acquired Immunodeficiency Syndrome
AIO-HMM - Autoregressive Input-Output Hidden Markov Model
ANN - Artificial Neural Network
AV - Autonomous Vehicle
BN - Bayesian Network
BSD - Blind Spot Detection
CANbus - Controller Area Network bus protocol
CNN - Convolutional Neural Network
DBN - Dynamic Bayesian Network
DBRNN - Deep Bidirectional Recurrent Neural Network
DR - Detection Rate
EBA - Emergency Brake Assist
ECU - Electronic Control Unit
EEG - Electroencephalogram
EOG - Electrooculogram
ESC - Electronic Stability Control system
FC - Fully Connected neural network
FCW - Forward Collision Warning
FPPF - False Positives Per Frame
FPR - False Positive Rate
F-RNN-EL - Fusion-Recurrent Neural Network Exponential Loss
F-RNN-UL - Fusion-Recurrent Neural Network Uniform Loss
GA - Genetic Algorithm
GAN - Generative Adversarial Network
GC - Global Context
GPR - Gaussian Process Regression
GPS - the Global Positioning System
GRU - Gated Recurrent Unit
HA - Highway Assist
HEM - Hard Examples Mining
HG - Hypothesis Generation
HIV - Human Immunodeficiency Virus
iv
HOG - Histogram of Oriented Gradients
HV - Hypothesis Verification
IO-HMM - Input Output Hidden Markov Model
IPM - Inverse Perspective Mapping
IR - Infrared Radiation
LBP - Local Binary Patterns
LC - Lane Centering
LDW - Lane Departure Warning
Lidar - Light Detection And Ranging
LoG - Line of Gaze
LSTM - Long-Short Term Memory
MPE - Mean Prediction Error
NDS - Naturalistic Driving Study
NHTSA - National Highway Traffic Safety Administration
NMS - Non Maximum Suppression
OBD-II - On-Board Diagnostic system
PHOG - Pyramid Histogram of Oriented Gradients
PIA - Percentage of Inside Area
PoG - Point of Gaze
RA - Reinforced Attention
Radar - Radio Detection And Ranging
R-CNN - Region-Based Convolutional Neural Network
R-FCN - Region-based Fully Convolutional Network
RNN - Recurrent Neural Network
ROC - Receiver Operating Characteristics curve
ROI - Region Of Interest
RVM - Relevance Vector Machine
SAE - the Society of Automotive Engineers
SHRP 2 - the second Strategic Highway Research Program
SIFT - Scale Invariant Feature Transforms
S-RNN - Simple Recurrent Neural Network
SURF - Speeded Up Robust Features
SVM - Support Vector Machines
TPR - True Positive Rate
UBI - Usage-Based Insurance
WHO - World Health Organization
YOLO - You Only Look Once
v
Acknowledgements
vi
Contents
Abstract ii
List of Acronyms iv
Acknowledgements vi
Contents vii
List of Tables xi
1 Introduction 1
1.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Driver Behavior Analysis Applications . . . . . . . . . 4
Vehicle-Oriented Applications . . . . . . . . . . . . . . 4
Management-Oriented Applications . . . . . . . . . . . 5
Driver-Oriented Applications . . . . . . . . . . . . . . 6
1.1.2 Advanced Driver Assistance Systems (ADASs) . . . . . 7
Level 0 (No Driving Automation) . . . . . . . . . . . . 8
Level 1 (Driver Assistance) . . . . . . . . . . . . . . . 8
Level 2 (Partial Driving Automation) . . . . . . . . . 9
Level 3 (Conditional Driving Automation) . . . . . . . 9
Level 4 (High Driving Automation) . . . . . . . . . . . 10
Level 5 (Full Driving Automation) . . . . . . . . . . . 10
1.1.3 Driver Maneuver Prediction . . . . . . . . . . . . . . . 11
Models for Driver Maneuver Prediction . . . . . . . . . 12
Cognitive Driver Modeling . . . . . . . . . . . . 12
Behaviorist Driver Modeling . . . . . . . . . . . 13
vii
Some Recent Driver Maneuver Prediction Methods Based
on Deep Learning Techniques . . . . . . . . . 14
1.2 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.1 Primary Conjecture . . . . . . . . . . . . . . . . . . . . 16
1.2.2 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.3 RoadLAB Vehicular Configuration . . . . . . . . . . . 20
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 24
viii
3.3.3 Object Detection Stage . . . . . . . . . . . . . . . . . . 86
Model A . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Model B . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.4 Data Augmentation . . . . . . . . . . . . . . . . . . . . 89
3.3.5 Integrating Detection Results . . . . . . . . . . . . . . 90
3.3.6 Object Recognition Stage . . . . . . . . . . . . . . . . 91
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.2 Results for the Object Detection Stage . . . . . . . . . 94
Assessing the Accuracy of the Trained ResNet101 CNN
Model . . . . . . . . . . . . . . . . . . . . . . 94
Assessing the Accuracy the Object Detection Stage . . 94
3.4.3 Trustworthiness Quantification . . . . . . . . . . . . . 96
3.4.4 Results for Object Recognition Stage . . . . . . . . . . 98
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
ix
Eye Glasses/Headband . . . . . . . . . . . . . . . . . . 144
Eye Tracker . . . . . . . . . . . . . . . . . . . . . . . . 145
5.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . 146
Metric 1 (M1). . . . . . . . . . . . . . . . . . . . 148
Metric 2 (M2). . . . . . . . . . . . . . . . . . . . 148
Metric 3 (M3). . . . . . . . . . . . . . . . . . . . 149
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 149
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
VITA 185
x
List of Tables
xi
List of Figures
xii
2.9 A sequence of time slices belonging to a right lane change event.
(t1 ): Driver goes straight and looks forward. (t2 and t3 ): Driver
decides to initiate an attempt to change lane, and searches vi-
sually for potential obstacles in the right lane. (tn and tn+1 ):
Attention of the driver returns to the current lane and the driver
still goes straight. (tT −1 ): The driver makes the final decision to
change lane and looks at the right lane. (tT ): Right lane change
event has occurred. . . . . . . . . . . . . . . . . . . . . . . . . 55
2.10 Confusion matrices of our prediction model . . . . . . . . . . . 61
2.11 The effect of the threshold on the F 1 score for IO-HMM and
LSTM models. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
xiii
3.11 Trustworthiness quantification. . . . . . . . . . . . . . . . . . . 98
3.12 Confusion matrix from trained ResNet101 for traffic sign recog-
nition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.13 Confusion matrix from trained ResNet101 for traffic light recog-
nition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.14 Confusion matrix from trained ResNet101 for vehicle recognition.100
4.1 The lane detection model provides two lane vectors, each con-
sisting of 14 coordinates in the image plane that represent the
predicted left and right boundaries of the ego lane. . . . . . . . 127
4.2 Forward stereoscopic vision system mounted on rooftop of the
RoadLAB experimental vehicle. . . . . . . . . . . . . . . . . . 128
4.3 Map of the predetermined course for drivers, located in London,
Ontario, Canada. The path includes urban and suburban driving
areas and is approximately 28.5 kilometers long. . . . . . . . . 128
4.4 Examples of annotated samples of our lane detection dataset. . 128
4.5 Visualization of the lane type classification stage, from a sample
road image to the ego lane boundaries. . . . . . . . . . . . . . 130
4.6 Lane boundary samples of our train-and-test data a) Dashed
White, b) Dashed Yellow, c) Solid White, d) Solid Yellow, e)
Double Solid Yellow f ) Dashed-Solid Yellow, g) Solid-Dashed
Yellow, h) Road Boundary . . . . . . . . . . . . . . . . . . . . 131
4.7 Visualization of the Euclidean error between the predicted lane
coordinates and the corresponding ground truth coordinates. . . 133
4.8 Confusion matrix from ResNet101 for lane type classification. 134
4.9 Output samples of our experiments on the RoadLAB dataset. . 135
6.1 Two samples of PoG of the driver (the red point) . . . . . . . 164
6.2 Overview of our model using applying to a sample frame . . . 164
6.3 Output samples of our experiments on the RoadLAB dataset . 168
xiv
1
Chapter 1
Introduction
In this research, we aim to analyze and model driver behavior using real
2
driving data for designing ADASs for on-road vehicles. In fact, co-driver
ADASs, first should understand and analyze driver behavior during driving
to monitor the driver. Also a kind of ADAS system may aim to predict the
most probable next maneuver of the driver and assist the driver or intervene
if it finds that necessary. In our work, by employing a deep learning model,
we predict driver maneuvers using dynamic vehicle and cephalo-ocular behav-
ioral features. Moreover, we identify driver attention based on the attentional
visual filed of the driver and four major traffic object types including vehicle,
traffic light, traffic sign and pedestrian. For this, we first need a model to de-
tect and recognize the aforementioned traffic objects based on the attentional
visual filed and we develop that. Furthermore, we attempt to discover where
the driver is gazing at in a course of driving to reach a better understanding
of driver gaze behavior. Also, we detect and classify road lanes in the urban
and suburban areas which this provides us more contextual information.
The next section presents a literature survey of related research on driver
behavior analysis applications, ADAS systems focusing on the relationship
between these systems and the driver’s role and driver maneuver prediction.
After the survey, an over of the research in this thesis is presented, along with
several hypotheses motivating the research, and followed by a brief overview
of the instrumented vehicle and data collected. The Chapter concludes with
a summary of the main contributions and the thesis organization.
Many studies have been conducted on analyzing driver behavior to achieve dif-
ferent goals such as safety driving, traffic management, commercial purposes,
and so on. In [65], an overview of different driver behavior analysis methods
has been provided. They categorized the driver behavior analysis applications
into three classes including vehicle-oriented applications, management-oriented
applications, and driver-oriented applications. These categories are described
in more detail in the following along with some of their subcategories.
Vehicle-Oriented Applications
These applications focus mainly on the vehicles to improve the driving task
and reduce driver workload by creating advanced systems to assist drivers
in different driving situations. These systems interact drivers in a real time
manner. This category consists of three main subcategories including ”Intel-
ligent Vehicles Systems and Autonomous Vehicles”, ”Driver Assistance” and
”Accidents Detection”.
The first subcategory is the recent area of exploration which looks to em-
ploy new technologies to automate vehicle tasks [12], [10], [9]. In [7], Google
developed its first fully autonomous car prototype, followed by car companies
of Tesla, Mercedes and Volkswagen. The applications of this subcategory ex-
ploit advanced vehicular control and environmental detection technologies [26]
using real-time data such as traffic information and nearby vehicles.
The second subcategory includes applications which aim to assist the driver
5
in different driving tasks such as blind spot detection, parking assistance, etc.
Nowadays, these systems are employed by car manufacturers to reduce driver
error caused by inattention, distraction, such as emergency braking systems
[18], [42] and lane keeping assistance systems [6], [56].
Management-Oriented Applications
The applications that fall into this category aim to optimize the vehicle use,
mainly including fleet management and traffic modeling. These applications
focus on the management of infrastructure and resources by monitoring the
road conditions and the vehicle. These systems identify road conditions based
on the driver maneuvers such as acceleration, braking, and the data related
to three-axes accelerations [48], [5]. Consequently, these technologies yield
effective planning for managing the traffic and also maintaining the roads.
Moreover, transport companies can establish effective fleet management and
using such applications they can monitor their vehicles in terms of speed, safety
inspection as well as fuel consumption. Also, they can reduce the risks for their
drivers and vehicles, decrease their costs and improve the performance of their
services [23], [31], [1].
6
Driver-Oriented Applications
Applications in this category consider the driver as the main element. The ma-
jor application areas that fall into this category are ”Driver Attention Evalua-
tion”, ”Distraction Detection”, ”Driving Style Assessment” and ”Driver Intent
Prediction”.
Driver attention evaluation is one of the main research areas in the field
of driver behavior analysis. These applications analyze the attention of the
driver [51], [54], [59], [62] and somnolence of the driver [28], [13] during driving
using information such as facial features, gaze activity, heart rate and so on.
In distraction detection systems, the degree of driver focus on the road is
identified and these applications look to detect driver distraction considering
driver reactions [21], [30]. Other applications in this category can be classified
into two classes of the driving style assessment and driver intent prediction.
The former aims to categorize the driving mode based on a variety of features
collected from the vehicle and the driver’s actions such as acceleration, steering,
speed, braking and GPS [53], [20], [58]. In other words, the data analysis stage
in these systems is to find and assess the correlation between driving style and
the input data. Aggressive style and risky style are the two common styles
in this area of research. The resulting information is of great importance
for automobile insurers who calculate Usage-Based Insurance (UBI) [22], [63].
Using these techniques the insurance costs for each driver can be determined
based on the driving score. This approach can increase the affordability of
insurance for lower-risk drivers, many of whom are also lower-income people
[22]. As for driver intent prediction, these applications aim to anticipate the
most probable next maneuver (overtaking, lane change, emergency braking,
etc.) of the driver using the methods of automatic prediction of maneuvers
7
ADASs are designed to increase car and road safety by assisting drivers in
dangerous driving situations. ADASs play a critical role to prevent fatalities
and injuries by reducing the number of collisions and the serious impacts of
those accidents that cannot be avoided. These systems may benefit from vari-
ous sources of information including the Controller Area Network bus protocol
(CANbus) vehicular data, a GPS system, Lidar, Radar, and cameras to per-
form their tasks. The Society of Automotive Engineers (SAE) has categorized
driving automation into five levels [16]. Fig.1.1 illustrates these levels. The
following provides an overview of ADASs with consideration of the relation-
ship between these systems and the role of the driver according to the level of
automation [16], [37].
8
The majority of vehicles on the road are manually controlled which means
they are in Level 0. These systems monitor the driving environment and
provide information to the driver but do not control the vehicle. Several
examples of such systems are: Parking Sensors: provide an acoustic warning
about surrounding obstacles depending on their distances while parking a car.
Lane Departure Warning (LDW): alarms the driver if the driver accidentally
leaves the current lane. Blind Spot Detection(BSD): informs the driver if an
obstacle exists in the blind spot of the rear-facing mirrors. Forward Collision
Warning (FCW): provides the driver a warning about an imminent accident
with an obstacle ahead. Night Vision: by means of IR illuminator and camera,
improves driver’s perception of the road ahead in the darkness.
Level 1 is the lowest level of automation. These systems perform single func-
tionalities in specific driving situations and also control the vehicle with proper
actuators. However, Level 1 and 2 still assign authority to the driver. Exam-
ples of Level 1 systems include: Anti-lock Braking Systems (ABS) which while
braking avoids wheel lock and tire saturation and so provides a reduction in
braking distance and better vehicle stability. Electronic Stability Control sys-
tems (ESC) which can automatically brake a single wheel to better keep the
vehicle stable when the system recognizes that it needs to control the steer.
Adaptive Cruise Control (ACC) which, in addition to keeping the vehicle at
the desired speed, can maintain a safe distance from traffic ahead by employing
both cutting engine power and actuating the brakes. Emergency Brake Assist
(EBA) which can automatically apply the brakes if it detects an impending
9
collision. In an urgent situation if the driver is not braking adequately, the sys-
tem can provide additional braking power to avoid a collision. Lane Centering
(LC) which, unlike lane departure systems that gives a warning to the driver,
maintains the vehicle in the center of the lane by continuously controlling the
steer of the vehicle.
As mentioned, Level 2 and Level 1 systems leave the authority to the driver but
Level 2 systems can perform more complex maneuvers, control both steering
and accelerating/decelerating. Tesla Autopilot and Cadillac (General Motors)
Super Cruise systems both qualify as Level 2. Highway Assist (HA) systems
combine ACC, LC, and BSD, for continuously controlling longitudinally and
laterally the vehicle. These systems can help reduce driver stress and fatigue
and allow drivers to feel safer on highways while driving. Autonomous Obsta-
cle Avoidance systems, similar to HA, control the vehicle longitudinally and
laterally to avoid an accident with an obstacle. Autonomous Parking systems
help the driver to find a suitable parking and then assist in parking the car by
controlling the steer, speed and avoid collision. These systems still leave the
overall authority to the driver.
trol of the vehicle although he/she is not required to continuously monitor the
driving environment. According to the SAE standard, these systems need re-
dundancies in sensors and decision Electronic Control Units (ECU) to perform
their roles. Highway Chauffeur [38] is an example of a Level 3 system. This
system is an evolution of HA that autonomously plans when to overtake and
accepts full responsibility for the maneuver.
In Level 4 systems, taking control of the vehicle by the driver is not required
most of the time. These systems extend the scenarios where they can make
decisions, manage situations, and perform all the necessary driving tasks in
those situations. For these systems, an integrated intelligence with all-around
sources of sensing is required. Automatic Valet Parking [39] is an example of
a Level 4 system. In this system, the vehicle takes the responsibility to find a
parking spot and to park the car after the driver has left the vehicle. In level 4
systems, communication between the vehicle and the infrastructure is usually
needed to improve performance.
Level 5 is the final automation level so that the vehicles do not require human
attention. Level 5 vehicles can even lack interfaces such as steering wheels
or acceleration/braking pedals. In fact, the driver is treated as a typical pas-
senger, who just sets a destination and can even sleep while the vehicle is
performing all transportation tasks to arrive at the predetermined destina-
tion.
11
In the ADAS context, the prediction of driver maneuver is one of the princi-
pal targets of driver behavior modeling. Driver maneuvers can be considered
according to traffic and road infrastructure [2]. Reichart [45] and Tolle [55]
categorized driver maneuvers which are mentioned in Table 1.1. These two
categories present driving maneuvers on the same level of granularity and only
differ to a minor degree. For example, the list of maneuvers provided by Tolle
[55], is sufficient to fully cover any trip in city and rural areas as well as on high-
ways. This list does not include unexpected changes in traffic conditions such
as the sudden appearance of an obstacle. The other maneuver lists that have
been suggested in the literature are similar to the items mentioned above, the
differences in the list of maneuvers relating mostly to the aim of the intended
application. For instance, the work developed in [35] focuses on maneuvers
that occur on highways.
Table 1.1: List of driver maneuvers provided by [45] and [35]
Reichart Tölle
Follow lane Start
React to obstacle Follow
Turn at intersection Approach vehicle
Cross intersection Overtake vehicle
Turn into street Cross intersection
Change lane Change lane
Turn around Turn at intersection
Drive backwards Drive backwards
Choose velocity Park
Follow vehicle
task because the interactions between the sensors are complex and a driver’s
intentions can not be directly identified. Many internal and environmental
factors can influence driver behavior, which ideally should be considered to
provide a faithful model [2]. These factors include, but are not limited to:
Driver behavior models can be divided into two classes: cognitive driver models
and behaviorist driver models [2].
Olabiyi et al. [36] proposed a method for anticipating driver actions includ-
ing braking, lane changes and turn anomaly actions. Their prediction system
employed Deep Bidirectional Recurrent Neural Network (DBRNN) including
multiple Long-Short Term Memory (LSTM) units and/or Gated Recurrent
Units (GRU) cells that discovers the spatial-temporal dependencies in tempo-
ral data. In [46], the authors presented a new sensory-fusion framework based
on deep learning to predict driver maneuvers which utilized a variety of sensory
data such as inside and outside camera videos, vehicle speed, GPS and other
related information. In order to learn spatial relationships and capture long
temporal dependencies, their model took advantage of a combination of dilated
CNN and convolutional neural network maxpooling (CNN maxpooling) pairs.
In [64], a novel model called Cognitive Fusion-RNN (CFRNN) was proposed
to predict driving maneuvers which combined both a cognition-driven model
and data-driven model. The CFRNN model included two LSTM units to fuse
the data from both inside and outside of the vehicle in a cognitive way and
the two LSTM units were regulated by the driver cognition time process. The
authors in [34] proposed a method including two parts of processing to antic-
ipate driver maneuvers. In the first part, in addition to the outside features
they extracted features using CNN DenseNet121 [44] architecture from the in-
side frames. The second part mainly included the construction of CNN-LSTM
model that is a combination of two standard models of CNN and LSTM. In
[33], Mora et al. proposed a simplified model to predict the emergency braking
intention using a deep learning method and electroencephalogram (EEG) data
without transforming the EEG data into gray-scale images. Their method was
15
able to discriminate the events of normal driving and emergency braking using
only four electrodes. In [17], a model named Attention-based Global Context
Network (AGCNet) was proposed to predict driver maneuvers. This model
utilizes multi-modal data, including front view frame data and driver physio-
logical data to perform its task. By proposing the Global Context (GC) block
and Channel-wise Attention (CA), AGCNet is capable of generating global
context features and choosing valuable ones in a effective way. The AGCNet
model coupled with a new Dual attention-based LSTM (DaLSTM) network
learns co-occurrence features and predicts driver maneuvers. In [57], a hybrid
deep learning based model was proposed to predict lane-changing behavior of
the driver. The first level of the hybrid model includes Seq2Seq, a variant of
RNN [43], which is mainly employed for temporal data processing to decrease
invisible data loss. The second level includes a fully connected neural network
(FC) to fuse data and classify lane-changing. The two-level training model en-
ables the Seq2Seq-FC network to deepen the number of network layers while
it can avoid gradient dispersion problem.
driving context and monitor it. The system should be able to warn the driver
about an unseen obstacle or a traffic object such as a pedestrian, vehicle, or
sign or even take control of the vehicle in critical situations. Developing models
of understanding and prediction of driver behavior using such data can enable
advancement in technologies relating to the vehicle and its passenger’s safety
and at a higher level, road safety.
1.2.2 Hypotheses
In this section, we break down the main conjecture into several hypotheses
which can be empirically investigated in the following. We address these hy-
potheses in Chapters 2, 3, 4, 5 and 6 accordingly.
2. It is possible to detect and recognize all traffic objects inside the atten-
tional visual field of the driver: The attentional visual area of drivers
is a central part of safe driving which is computed as a 2D ellipse in
the imaging plane of the stereo system. We verify this hypothesis by
the fact that we can find objects in the traffic scene. Therefore, those
objects would locate inside the attentional field of the driver, which has
been previously obtained by Kowsari et al. [27]. This enables us to
detect and recognize those objects located inside the visual attentional
area of the driver. To explore this hypothesis, we focus on the traffic ob-
18
Our research is based on data gathered in the RoadLAB project. Data was
gathered using an experimental vehicle that was equipped with a forward
stereoscopic system, OBD-II CANbus, and an eye tracker [4]. (see Fig.1.2.)
This configured vehicle was able to record data as follows:
2. The stereoscopic system mounted on the vehicle’s roof recorded the front
view of the vehicular driving environment at 30Hz.
system of the vehicle to record vehicle odometry information and the driver-
related elements such as steering wheel, accelerator/brake pedals, and turn
signals. Stereo cameras were employed to collect data on the environment
including road markers and traffic signs. FaceLAB, a commercial gaze and
head tracking system, was employed to gather eye and head positions. In
order to cross-calibrate the stereo system and FaceLAB, a new algorithm was
devised in the research RoadLAB group [27].
Each participant drove the instrumented vehicle on a predetermined 28.5km
course within the city of London, ON, Canada. (see Fig.1.3.) The course in-
cludes downtown, urban and suburban areas of the city. The driver sequences
were captured in different weather conditions including sunny (9 driver se-
quences), partially sunny (4 driver sequences), and partially cloudy (3 driver
sequences). Moreover, regarding RoadLAB data, there was ethics approval for
the driving experiments and the use of the resulting data for analysis; the data
was anonymized.
22
Figure 1.3: Map of the predetermined course for drivers, located in London,
Ontario, Canada. The path includes urban and suburban driving areas and is
approximately 28.5 kilometers long.
1.3 Contributions
This thesis is an inherent part of the RoadLAB research program, instigated
by Professor Steven Beauchemin, and is entirely concerned with vehicular in-
strumentation for the purpose of the study of driver behavior/intent. Chapters
2, 3, 4, and 5 have been published in recognized peer-reviewed venues. In what
follows I describe my contributions with regard to each publication within the
thesis:
CNN-based method to detect and classify road lanes in urban and suburban
areas. In Chapter 5, contributions related to average driver attention esti-
mated based on the attentional visual field of the driver with respect to traffic
objects are presented. In Chapter 6, we describe our method to measure the
average percentage of the driving time in which a driver has gazed at traffic
objects. Finally, Chapter 7 provides conclusions and outlines paths for future
research.
26
Bibliography
[6] Yougang Bian, Jieyun Ding, Manjiang Hu, Qing Xu, Jianqiang Wang,
and Keqiang Li. “An advanced lane-keeping assistance system with
switchable assistance modes”. In: IEEE Transactions on Intelligent
Transportation Systems 21.1 (2019), pp. 385–396.
[7] B. Bilger. Has the self-driving car at last arrived? The New Yorker
(2013). http : / / www . newyorker . com / reporting / 2013 / 11 / 25 /
131125fa_fact_bilger?currentPage=all.
[9] Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler,
and Raquel Urtasun. “Monocular 3d object detection for autonomous
driving”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016, pp. 2147–2156.
[11] Jae Gyeong Choi, Chan Woo Kong, Gyeongho Kim, and Sunghoon Lim.
“Car crash detection using ensemble deep learning and multimodal data
from dashboard cameras”. In: Expert Systems with Applications 183
(2021), p. 115400.
28
[15] B. Donmez, L.N.g. Boyle, and J.D. Lee. “Safety implications of provid-
ing real-time feedback to distracted drivers”. In: Accident Analysis &
Prevention 39.3 (2007), pp. 581–590.
[16] M. Galvani. “History and future of driver assistance”. In: IEEE Instru-
mentation & Measurement Magazine 22.1 (2019), pp. 11–16.
[18] G. Griffin, D. Kwiatkowski, and J. Miller. U.S. pat. No. 9248815. Wash-
ington, DC: U.S. Patent and Trademark Office. 2016.
[22] Siniša Husnjak, Dragan Peraković, Ivan Forenbacher, and Marijan Mumdziev.
“Telematics system in usage based motor insurance”. In: Procedia En-
gineering 100 (2015), pp. 816–825.
[23] Nidhi Kalra, Raman Kumar Goyal, Anshu Parashar, Jaskirat Singh,
and Gagan Singla. “Driving Style Recognition System Using Smart-
phone Sensors Based on Fuzzy Logic”. In: CMC-COMPUTERS MA-
TERIALS & CONTINUA 69.2 (2021), pp. 1967–1978.
[24] Alireza Khodayari, Reza Kazemi, Ali Ghaffari, and Reinhard Braun-
stingl. “Design of an improved fuzzy logic based model for prediction
of car following behavior”. In: 2011 IEEE International Conference on
Mechatronics. IEEE. 2011, pp. 200–205.
[25] I.H. Kim, J.H. Bong, J. Park, and S. Park. “Prediction of driver’s in-
tention of lane change by augmenting sensor information using machine
learning techniques”. In: Sensors 17.6 (2017), p. 1350.
[26] Jeamin Koo, Jungsuk Kwac, Wendy Ju, Martin Steinert, Larry Leifer,
and Clifford Nass. “Why did my car just do that? Explaining semi-
autonomous driving actions to improve driver understanding, trust,
30
[28] Vijay Kumar, Shivam Sharma, et al. “Driver drowsiness detection us-
ing modified deep learning architecture”. In: Evolutionary Intelligence
(2022), pp. 1–10.
[29] Stéphanie Lefèvre, Ashwin Carvalho, Yiqi Gao, H Eric Tseng, and
Francesco Borrelli. “Driver models for personalised driving assistance”.
In: Vehicle System Dynamics 53.12 (2015), pp. 1705–1720.
[30] Tianchi Liu, Yan Yang, Guang-Bin Huang, Yong Kiang Yeo, and Zhip-
ing Lin. “Driver distraction detection using semi-supervised machine
learning”. In: IEEE transactions on intelligent transportation systems
17.4 (2015), pp. 1108–1120.
[31] Eilham Hakimie bin Jamal Mohd Lokman, Vik Tor Goh, Timothy Tzen
Vun Yap, and Hu Ng. “Driving style recognition using machine learning
and smartphones”. In: F1000Research 11.57 (2022), p. 57.
[33] Hermes J Mora and Esteban J Pino. “Simplified Prediction Method for
Detecting the Emergency Braking Intention Using EEG and a CNN
31
[36] Oluwatobi Olabiyi, Eric Martinson, Vijay Chintalapudi, and Rui Guo.
“Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural
Network”. In: arXiv preprint arXiv:1706.02257 (2017).
[40] World Health Organization et al. Global status report on road safety
2018: Summary. Tech. rep. World Health Organization, 2018.
[41] Sourav Kumar Panwar, Vivek Solanki, Sachin Gandhi, Sankalp Gupta,
and Hitesh Garg. “Vehicle accident detection using IoT and live track-
32
[42] D. Parker, K. Cockings, and M. Cund. U.S. pat. NO. 9682689. Wash-
ington, DC: U.S. Patent and Trademark Office. 2017.
[43] Yuchen Qiao, Kazuma Hashimoto, Akiko Eriguchi, Haixia Wang, Dong-
sheng Wang, Yoshimasa Tsuruoka, and Kenjiro Taura. “Parallelizing
and optimizing neural Encoder–Decoder models without padding on
multi-core architecture”. In: Future Generation Computer Systems 108
(2020), pp. 1206–1213.
[47] D.D. Salvucci and K.L. Macuga. “Predicting the effects of cellular-
phone dialing on driver performance”. In: Cognitive Systems Research
3.1 (2002), pp. 95–102.
33
[48] Shahram Sattar, Songnian Li, and Michael Chapman. “Road surface
monitoring using smartphone sensors: A review”. In: Sensors 18.11
(2018), p. 3845.
[50] Mohsen Shirpour. “Predictive Model of Driver’s Eye Fixation for Ma-
neuver Prediction in the Design of Advanced Driving Assistance Sys-
tems”. In: (2021).
[52] Kenji Takagi, Haruki Kawanaka, Md. Shoaib Bhuiyan, and Koji Oguri.
“Estimation of a three-dimensional gaze point and the gaze target from
the road images”. In: Intelligent Transportation Systems (ITSC), 14th
International IEEE Conference on. IEEE. 2011, pp. 526–531.
[53] Farid Talebloo, Emad A Mohammed, and Behrouz Far. “Deep Learn-
ing Approach for Aggressive Driving Behaviour Detection”. In: arXiv
preprint arXiv:2111.04794 (2021).
[56] Qun Wang, Weichao Zhuang, Liangmo Wang, and Fei Ju. Lane keeping
assist for an autonomous vehicle based on deep reinforcement learning.
Tech. rep. SAE Technical Paper, 2020.
[57] Cheng Wei, Fei Hui, and Asad J Khattak. “Driver lane-changing behav-
ior prediction based on deep learning”. In: Journal of advanced trans-
portation 2021 (2021).
[58] Samuel Würtz and Ulrich Göhner. “Driving Style Analysis Using Re-
current Neural Networks with LSTM Cells”. In: Journal of Advances
in Information Technology Vol 11.1 (2020).
[59] Guoliang Yuan, Yafei Wang, Huizhu Yan, and Xianping Fu. “Self-
calibrated driver gaze estimation via gaze pattern learning”. In: Knowledge-
Based Systems 235 (2022), p. 107630.
[60] S.J. Zabihi, S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Detec-
tion and recognition of traffic signs inside the attentional visual field
of drivers”. In: 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE.
2017, pp. 583–588.
[61] S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Real-time driving
manoeuvre prediction using IO-HMM and driver cephalo-ocular be-
haviour”. In: Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE.
2017, pp. 875–880.
[62] Yingji Zhang, Xiaohui Yang, and Zhe Ma. “Driver’s Gaze Zone Estima-
tion Method: A Four-channel Convolutional Neural Network Model”.
In: 2020 2nd International Conference on Big-data Service and Intelli-
gent Computation. 2020, pp. 20–24.
35
[63] Wenyi Zheng, Wei Nai, Fangqi Zhang, Weiyang Qin, and Decun Dong.
“A novel set of driving style risk evaluation index system for UBI-based
differentiated commercial vehicle insurance in China”. In: CICTP 2015.
2015, pp. 2510–2524.
[64] Dong Zhou, Huimin Ma, and Yuhan Dong. “Driving maneuvers pre-
diction based on cognition-driven and data-driven method”. In: 2018
IEEE Visual Communications and Image Processing (VCIP). IEEE.
2018, pp. 1–4.
[65] Kawtar Zinebi, Nissrine Souissi, and Kawtar Tikito. “Driver Behav-
ior Analysis Methods: Applications oriented study”. In: Proceedings of
the 3rd International Conference on Big Data, Cloud and Application
(BDCA 2018). 2018.
36
Chapter 2
results show that our model, using the identical dataset, improved the F1
score by 4% to 84%.
2.1 Introduction
The number of vehicles on our streets and highways increases every day. This
fact makes the analysis of traffic situations increasingly complicated. For ex-
ample, in the US alone, at least 33,000 people on average die in road acci-
dents every year, with unsuitable maneuvers being reported as the main cause
for most of these accidents [8]. Hence, vehicle manufacturers have been de-
veloping advanced driver assistance systems (ADASs) to assist the driver in
various driving tasks where ADASs are able to avoid up to 40% of vehicle
accidents [11]. Examples of ADASs include adaptive cruise control, collision
avoidance systems, traffic warning systems, smartphone connectivity, lane de-
parture warning systems, automatic lane centering, blind spot monitoring, etc.
Obviously, improving the reliability and robustness of these systems would
have a significant impact on decreasing the number of collisions and accident
injuries.
An ADAS consists of advanced sensors and camera systems and is activated
when some specific predefined conditions are satisfied. In traditional ADAS,
a threshold is considered for the inputs and if these inputs are greater than
the threshold, the ADAS is activated [21]. Modeling driving behavior of the
driver in different traffic scenes, in addition to understanding surrounding
environment, makes an ADAS more useful for assisting the driver in controlling
the vehicle and avoiding collisions. The goal of this research is to model a
driver’s behavior so that the ADAS can predict the next driving maneuver a
few second before it occurs.
38
the driving environment information only based on several seconds before the
current situation [4].
In order to predict driver maneuvers, our LSTM-based model learns the
parameters from real driving sequences, including vehicle dynamics, driver’s
head movements, as well as gaze data. Then the model infers the potential
driving maneuvers (namely, left/right turns, left/right lane changes and driv-
ing straight forward) by means of generating a probability for each maneuver.
In other words, the maneuver with the highest probability is considered as the
predicted maneuver.
The rest of this paper is structured as follows. In Section 2.2, we review
the literature. In Section 2.3, we explain our vehicle instrumentation. Section
2.4 contains a description of the proposed method. Section 2.5 presents a
summary of the datasets used, learning parameters, and the experimental
results obtained along with a critical analysis of those results. We discuss
several common reasons resulting in incorrect maneuver prediction in Section
2.6. We give conclusions and future research directions in Section 2.7.
ANNs are suitable techniques for pattern recognition and action prediction
applications, provided that enough experimental data is available. For driver
maneuver prediction, the inputs can be behavioral features, such as acceler-
ation, signaling and braking, and the ANN outputs the predicted maneuver.
For instance, Kim et al. [21] applied an ANN to measurements from the on-
board sensors, such as the steering wheel angle, the yaw rate and the throttle
position, to classify road conditions and to predict the driver’s intention for a
lane change. Leonhardt and Wanielik [27] employed an ANN for lane change
prediction. MacAdam and Johnson [31] represented driver steering behavior
in path regulation control tasks using elementary neural networks. Mitrovic
[34] used neural networks for short-term prediction of lateral and longitudinal
vehicle acceleration.
Although traditional ANNs, such as feed-forward neural nets, are powerful
machine learning techniques, ANNs are black box learning techniques. They
cannot interpret the relationship among the input and output. Moreover, in
the standard probabilistic framework, they cannot work with uncertainties.
Another disadvantage is that ANNs consider all input data independent of
each other, while in many applications, such as driver maneuver prediction,
the input data is a sequence of observations taken sequentially in time and, of
course, this temporal information is of great importance.
A Bayesian Network (BN) is an acyclic directed graph that constitutes
the conditional dependencies among a set of variables, where the directed
edges reflect the qualitative relationships between variables and conditional
probability distributions are considered as the quantitative relationships. BNs
have been employed for driver maneuver recognition such as overtaking, lane
changes or left/right turns [15, 18, 33]. Amata et al. [1] presented a prediction
model for driver behaviors, such as stopping at intersections based on traffic
41
conditions. Tezuka et al. [48] used a BN and steering wheel angle data to
develop a model to detect lane keeping, normal lane changes and emergency
lane changes. Also, BNs have been utilized for intersection safety systems to
recognize turning maneuvers at intersections as well as red light crossing [56].
BNs have been used for identifying emergency braking situations [44]. On the
one hand, BNs are suitable for applications, like driver maneuver modeling,
where considering uncertainties in modeling is essential. On the other hand,
considering temporal data using BNs is difficult. Li et al. [28] used a novel
Dynamic Bayesian Network (DBN) in highway scenarios to predict driver ma-
neuvers. DBNs can model temporal changes, although they cause increased
complexity in building and analyzing the network.
Temporal behavior analysis of vehicles surrounding the ADAS vehicle plays
an essential role in the safety of the driver. Hence, other methods have been
proposed to predict the intention of surrounding vehicles. For example, Kim et
al. [20] used an LSTM to propose a trajectory prediction technique for analyz-
ing the temporal behavior of surrounding vehicles and their future positions.
Also, Khosroshahi et al. [19] proposed a framework to classify maneuvers
of observed vehicles at four-way intersections using LSTM and 3D trajectory
cues. Using LSTM, a method has been introduced by Patel et al. [40] to pre-
dict lane changes of surrounding vehicles in highway driving. An RNN-based
model was presented to interpret the time series data about an observed ve-
hicles at signal-less intersections in order to classify their intentions [57].
For recognition of a driver’s intention, many researchers have utilized Hid-
den Markov Models (HMMs). Kuge et al. [25] developed steering behav-
ior models for normal/emergency lane changes, as well as lane keeping using
HMMs. Another approach was proposed by Tran et al. [49] to predict driver
maneuvers, including stop/non-stop, left/right lane changes and left/right
42
turns in both urban and highway driving environments. They employed differ-
ent input sets to investigate the model performance. He et al. [12] developed a
double-layer HMM structure to model driving behavior and driving intention
in the lower and upper layers, respectively. Amsalu and Homaifar [2] employed
a Genetic Algorithm (GA) for optimization, as well as for predicting a driver’s
intentions when the vehicle approaches an intersection. Aoude et al. [3] devel-
oped two SVM- and HMM-based approaches to estimate driver behaviors at
road intersections. Their results showed that the SVM-based approach often
outperformed the HMM-based model. Jain et al. [16] proposed a maneuver
prediction model based on an Autoregressive Input-Output Hidden Markov
Model (AIO-HMM), which jointly exploits the information inside and outside
of the vehicle.
Similarly, Zabihi et al. [55] developed a maneuver prediction model us-
ing an Input-Output Hidden Markov Model (IO-HMM) that learns relevant
parameters from natural driving sequences. They combined vehicle dynamics
features and two features of driver’s cephalo-ocular behavior, including driver
gaze direction and head pose for detecting driver intent. We followed the work
of Kowsari et al. [24] and Zabihi et al. [55] for feature extraction. We refer
the reader to these publications for more details.
Researchers also focused on driver maneuver prediction at (urban) inter-
sections. Klingelschmitt et al. [23] created two separate Bayesian Network and
Logistic Regression-based models for a vehicle’s driving situation and its be-
havior respectively. Then, they combined them in a single Bayesian Network
to design a model able to predict driver intent. In [42], an indicator-based
approach for driver intent prediction was proposed. They combined context
information with vehicle data. The authors in [30] proposed a new approach
for intersection maneuver prediction that was based on personalized incremen-
43
tal learning. In other words, they continuously improved the model accuracy
by incorporating individual driving history. Liebner et al. [29] proposed an
approach to predict driver intent including straight intersection crossing and
right turn with the presence or absence of a preceding vehicle. Their model
was based on an explicit parametric model for the longitudinal velocity of
preceding vehicles.
Recurrent Neural Networks (RRNs), Long Short-Term Memories (LSTMs)
and Convolutional Neural Networks (CNNs) have been utilized in different ap-
plications of ADAS and they have shown promising results, such as for driver
activity prediction [17, 38]. Jain et al. [17] employed a RNN with LSTM
units to keep long dependencies over the time. They applied their proposed
model on a real dataset to predict driver maneuvers. Olabiyi et al. [38] pro-
posed a method for anticipating driver action using a deep bidirectional RNN
by discovering the relationships between sensor information and future driver
maneuver. For this, they used a fusion of the past and future context. More-
over, deep learning has been employed for other ADAS applications, which has
brought significant improvements, such as classifying a vehicle’s situation for
lane changes as safe/unsafe [43] and detecting a driver’s confusion level [14].
In this study, we aim to apply LSTM as a deep learning-based method
to our natural driving sequences to predict driver maneuvers some number of
seconds before they occur. As a result, this would allow an ADAS to take
some actions if deemed dangerous or at least warn the driver. Previously, in
[55], a traditional method based on IO-HMM was proposed to anticipate three
maneuvers of left/right turns and driving straight forward using our dataset.
In addition to the aforementioned maneuvers, our model predicts the maneu-
vers of left/right lane changes as well. Our model takes advantage of three
different aspects of a driving environment in comparison to many previous pro-
44
posed maneuver prediction methods in the literature. First, since our model
employs an LSTM, which is capable of keeping long-term dependencies in the
temporal data, it is able to predict driver maneuvers better than works em-
ploying classifiers which are not suitable for time series data, such as [27, 41,
26]. The second aspect is related to the number of maneuvers that a maneuver
prediction system is able to predict. As mentioned, our model predicts five
maneuver types although there has been previous work that has predicted ma-
neuvers [35, 10, 51] they consider fewer maneuvers. Finally, our model utilizes
gaze information to perform its task while many previous works ignore such
this useful information, such as [51, 21, 49].
Figure 2.2: Map of predetermined route for drivers, located in London On-
tario, Canada. The path length is approximately 28.5 and includes urban and
suburban driving areas.
46
Figure 2.3: The on-board data recorder interface displaying depth maps, driver
PoG, vehicular dynamics, and eye tracker data.
where
q
d(e, g) = ((ex − gx )2 + (ey − gy )2 + (ez − gz )2 ) (2.2)
The circle is reprojected onto the imaging plane of the forward stereo vision
system where it becomes a 2D ellipse, as pictured in Figure 2.4. The identi-
fication of objects in the scene that elicit an ocular response from the driver
can then be identified within this area (Figure 2.5). The cross calibration pro-
cedure was devised by Kowsari et al. [24]. At the time of its deployment, this
was the first publicly known vehicle capable of identifying the 3D PoG of the
driver in real-time and in absolute 3D coordinates.
Figure 2.4: The attentional visual area of driver is defined as the base of the
cone located at the depth of sighted features.
Figure 2.5: Two projections of the visual attention cone base on the stereo
imaging plane.
49
In this work, we focus on driver maneuver prediction using LSTMs [13]. LSTM
is a particular form of RNNs which is suitable for time series data. We briefly
explain the structure of LSTM. Figure 2.7 shows the internal structure of the
LSTM unit. An LSTM is able to keep the information of previous input data
50
Figure 2.6: Overview of the proposed approach for predicting driver maneuvers
in its memory, called a cell. Hence, it can overcome the vanishing gradient
problem in order to remember long-term dependencies. As mentioned before,
LSTMs have been employed in different ADAS applications [20, 19, 17].
ct = ct−1 ⊙ ft + it ⊙ gt (2.6)
ht = ot ⊙ tanh(ct ), (2.8)
where sigm, tanh and ⊙ are the sigmoid function, the hyperbolic tangent
function, and the element-wise product, respectively. W and b stand for the
weight matrix and bias vector. For multi-class applications, we employ a Soft-
Max layer in which the SoftMax function is applied on a linear transformation
of ht . The following notation describes the internal working of a recurrent
LSTM unit concisely. In Section 2.4.2, we describe how we reach an observa-
52
We proceed with describing the features that are extracted for maneuver pre-
diction. These features are divided into two major categories called driver
cephalo-ocular behavioral features and vehicle dynamics features. These fea-
tures are aggregated and normalized for each time slice (i.e. after receiving
20 consecutive frames in every 0.67 seconds of driving) and their combination
constitutes the feature vector, to be fed into the LSTM model. In what follows,
we discuss the extracted features for both categories.
(a) Left turn (b) Left lane change (c) Right turn
Figure 2.8: Gaze points are shown on the driving frames over the last 5 seconds
before a left/right turn, left/right lane change, or going straight maneuver
occurs. Frames are divided into six areas.
55
Figure 2.9: A sequence of time slices belonging to a right lane change event.
(t1 ): Driver goes straight and looks forward. (t2 and t3 ): Driver decides to
initiate an attempt to change lane, and searches visually for potential obstacles
in the right lane. (tn and tn+1 ): Attention of the driver returns to the current
lane and the driver still goes straight. (tT −1 ): The driver makes the final
decision to change lane and looks at the right lane. (tT ): Right lane change
event has occurred.
56
vehicular data provide the information that is essential for the application of
driver maneuver prediction.
Vehicle dynamics-based data include vehicle speed, steering wheel angle,
left/right turn signals, brake pedal pressure, gas pedal pressure and the speeds
of all wheels. We integrated features to benefit from the sum of them simulta-
neously. For each time slice, we made a histogram of steering wheel angles and
encoded the minimum, average and maximum values of vehicle speed, brake
pedal pressure, gas pedal pressure, indicating independent wheel speeds. Fi-
nally, for left and right turn signals, we considered a binary feature for each.
This feature value is 1 if the turn signal is on, and 0 otherwise.
2.5.1 Dataset
In the test step, the model predicts the driver maneuver every 20 frames
and we expect the prediction system to anticipate the maneuver using only
partial observations of a sequence. Previously, Zabihi et al.[55] proposed an
IO-HMM-based model to anticipate three maneuvers of left/right turns and
driving straight using our real driving dataset. To compare the performance
of our model with theirs, as a first experiment, we employed our approach to
predict Zabihi’s maneuvers only. In the second experiment, in addition to the
aforementioned maneuvers, we utilized our method to predict the maneuvers
of left/right lane changes. For each time slice (i.e. after receiving 20 frames),
the model generates the probability for each maneuver. Obviously, the sum
of these probabilities should be 1. Then, the maneuver with the highest prob-
ability is chosen as the predicted maneuver only if it is higher than a preset
threshold. If the highest probability is less than the threshold (0.8), the sys-
tem cannot predict the driver maneuver and requires reception of additional
features from the next time slice to perform its task. Note that if the maneuver
occurs and the system still has not predicted it, the system makes no predic-
tion. We verified the performance of our model by calculating the measures of
precision and recall for each maneuver. These measures are defined as follows:
tp
Pr = (2.10)
tp + fp
and
tp
Re = , (2.11)
t p + fn
where, for each maneuver m, tp is the number of correctly predicted instances
of maneuver m, fp is the number of incorrectly predicted instances of maneuver
60
m, and fn is the number of instances of maneuver m that are wrongly not pre-
dicted or the system does not choose any maneuver. In other words, precision
is the number of correctly predicted instances of maneuver m divided by the
number of instances that were predicted as maneuver m. Recall is the number
of instances of correctly predicted maneuver m divided by the total number
of instances of maneuver m. We computed the average of precision and recall.
We also computed the average of time-to-maneuver, for true predictions (tp ),
which indicates the interval between the time of algorithm’s prediction and the
start of the maneuver. Zabihi et al. [55] performed several experiments and
reported that utilizing IO-HMM with the data on the driver’s gaze and head
pose (IO-HMM G+H) made the better model in terms of precision, recall and
Time-to-Maneuver.
Table 2.2 compares our results (considering three and five maneuvers) with
their best results. As can be seen, our LSTM-based model outperformed their
prediction model. To be exact, precision and recall of our model for the three
maneuvers are 6.1% and 0.8% respectively higher than those of the previous
work by Zabihi et al. [55] for these three maneuvers. However, their method
can predict the three maneuvers 0.16s earlier on average than ours. The last
row in Table 2.2 shows the results of extending our model to predict two more
types of maneuvers. In this case, we obviously expect more complexity for the
problem and results show that precision, recall and time-to-maneuver have
decreased slightly in comparison with our method for predicting only three
maneuvers.
Figure 2.10 shows the confusion matrices for our prediction system for
three and five maneuvers. In these matrices, a row represents an instance of
the actual maneuver class, whereas a column represents an instance of the
predicted maneuver class. Consequently, the values of the diagonal elements
61
(a) Model with three maneuvers (b) Model with five maneuvers
Figure 2.11 compares the changes of the F1-score when we employ our
model and the IO-HMM-based model, with different values for the threshold.
The F1-score is the harmonic mean of Pr and Re , where it can reach 1 with
perfect precision and recall, and 0 in the worst case. In other words, the pre-
diction threshold is a useful parameter to find a trade-off between the precision
62
Figure 2.11: The effect of the threshold on the F 1 score for IO-HMM and
LSTM models.
2Pr Re
F1 = (2.12)
Pr + Re
As can be seen, the trend of F1-scores for the IO-HMM model remains roughly
stable when the threshold changes. However, when we choose 0.8 for the
threshold, the LSTM-based prediction model achieves a significantly higher
F1-score in comparison with IO-HMM model. In Table 2.2, we utilized the
threshold values which gave us the highest F1-score. Our model predicts ma-
neuvers every 0.67 seconds (20 frames) in 2.8 milliseconds on average on a
3.40GHz Core i7 − 6700 CPU with Windows 10.
Finally, we briefly mention here the results of several previous works which
have also addressed the driver maneuver prediction problem, using their own
dataset and features. For instance, Morris et al. [36] accomplished a binary
classification of lane changes and driving straight maneuvers. They employed a
Relevance Vector Machine (RVM; a Bayesian extension to the popular SVM).
63
In addition, Jain et al. [17] evaluated some algorithms for the same purpose
(including SVM, Bayesian Network and variants of their deep learning model).
The methods listed in Table 2.3 use identical feature vectors, which guarantees
a fair comparison1 . As can be observed, the SVM classification does not model
the temporal aspect of the data, and its performance is poor as a result.
We discuss some major reasons that can generally result in wrong anticipa-
tions in the driver maneuver prediction problem. For example, when a driver
is interacting with other passengers, head and gaze features are not reliable
enough to be taken into account. Also, a driver may be distracted when he/she
is watching videos, programming a GPS, using a cell phone, adjusting the ra-
dio, smoking and etc. In such situations, wrong anticipation is common as
1
The methods listed in the Table are: SVM: Support Vector Machine, IO-HMM: Input-
Output Hidden Markov Model, AIO-HMM: Auto-Regressive Input Output Hidden Markov
Model, S-RNN: Simple Recurrent Neural Network, F-RNN-UL: Fusion-Recurrent Neural
Network Uniform Loss, F-RNN-EL: Fusion-Recurrent Neural Network Exponential Loss.
64
the driver may not be fully focused on the road. Moreover, different drivers
have different driving styles. For example, during lane change maneuver, some
drivers may merge slowly while others may merge quickly that in this case,
the driver has not provided the system with enough data and time to predict
maneuver. Hence, in this situation, other features such as speed, accelera-
tion, steering wheel angle can be significant to predict an accurate maneuver.
As another example, when drivers rely on their recent perception of traffic
scene, they probably do not check blind spots and the surroundings carefully
resulting in lack of head information but we may still have valid gaze features.
A similar driving situation is when a driver is driving in left/right-turn-only
lanes. In this case, the driver might not give us helpful head information as
well.
Several limitations do exist and can be addressed for improving the accu-
racy and generality of the model. Adding more features from the environment,
such as the lane in which the driver is located or where the driver is gazing dur-
ing the driving maneuver, could improve the accuracy of the model. In terms of
generality, the tests conducted in this research were based on a limited number
of drivers and under specific weather and environmental conditions. Collect-
ing new data under different situations and training the model on a broader
set of data could help the generality of the model. Hence, for the commercial
use of this model, the mentioned items need to be considered. Lastly, this
research area is still challenging and more research is still needed before such
models are practical in commercial use. As for future work, we plan to study
the extraction of features from video within the attentional visual area of the
driver. We believe that utilizing LSTM trained with a combination of these
features, with cephalo-ocular behavior and the vehicle dynamics will improve
current prediction results.
66
Bibliography
[4] Abdelhadi Azzouni and Guy Pujolle. “A long short-term memory recur-
rent neural network framework for network traffic matrix prediction”.
In: arXiv preprint arXiv:1705.05690 (2017).
[5] S.S. Beauchemin, M.A. Bauer, T. Kowsari, and J. Cho. “Portable and
Scalable Vision-Based Vehicular Instrumentation for the Analysis of
67
[8] 2012 motor vehicle crashes: “N. Highway Traffic Safety Administration,
Washington, D.C.” In: Tech. Rep. (2013).
[11] Olaf Gietelink, Jeroen Ploeg, Bart De Schutter, and Michel Verhae-
gen. “Development of advanced driver assistance systems with vehicle
hardware-in-the-loop simulations”. In: Vehicle System Dynamics 44.7
(2006), pp. 569–590.
[12] Lei He, Chang-fu Zong, and Chang Wang. “Driving intention recogni-
tion and behaviour prediction based on a double-layer hidden Markov
68
[14] Chiori Hori, Shinji Watanabe, Takaaki Hori, Bret A Harsham, JohnR
Hershey, Yusuke Koji, Yoichi Fujii, and Yuki Furumoto. “Driver con-
fusion status detection using recurrent neural networks”. In: Multime-
dia and Expo (ICME), 2016 IEEE International Conference on. IEEE.
2016, pp. 1–6.
[16] Ashesh Jain, Hema S Koppula, Bharad Raghavan, Shane Soh, and
Ashutosh Saxena. “Car that knows before you do: Anticipating ma-
neuvers via learning temporal driving models”. In: Proceedings of the
IEEE International Conference on Computer Vision. 2015, pp. 3182–
3190.
[17] Ashesh Jain, Avi Singh, Hema S Koppula, Shane Soh, and Ashutosh
Saxena. “Recurrent neural networks for driver activity anticipation
via sensory-fusion architecture”. In: Robotics and Automation (ICRA),
2016 IEEE International Conference on. IEEE. 2016, pp. 3118–3125.
[18] Dietmar Kasper, Galia Weidl, Thao Dang, Gabi Breuel, Andreas Tamke,
Andreas Wedel, and Wolfgang Rosenstiel. “Object-oriented Bayesian
networks for detection of lane change maneuvers”. In: IEEE Intelligent
Transportation Systems Magazine 4.3 (2012), pp. 19–31.
69
[19] Aida Khosroshahi, Eshed Ohn-Bar, and Mohan Manubhai Trivedi. “Sur-
round vehicles trajectory analysis with recurrent neural networks”. In:
Intelligent Transportation Systems (ITSC), 2016 IEEE 19th Interna-
tional Conference on. IEEE. 2016, pp. 2267–2272.
[20] ByeoungDo Kim, Chang Mook Kang, Seung Hi Lee, Hyunmin Chae,
Jaekyum Kim, Chung Choo Chung, and Jun Won Choi. “Probabilistic
vehicle trajectory prediction over occupancy grid map via recurrent
neural network”. In: arXiv preprint arXiv:1704.07049 (2017).
[21] I.H. Kim, J.H. Bong, J. Park, and S. Park. “Prediction of driver’s in-
tention of lane change by augmenting sensor information using machine
learning techniques”. In: Sensors 17.6 (2017), p. 1350.
[27] Veit Leonhardt and Gerd Wanielik. “Neural network for lane change
prediction assessing driving situation, driver behavior and vehicle move-
ment”. In: Intelligent Transportation Systems (ITSC), 2017 IEEE 20th
International Conference on. IEEE. 2017, pp. 1–6.
[28] Junxiang Li, Xiaohui Li, Bohan Jiang, and Qi Zhu. “A maneuver-
prediction method based on dynamic bayesian network in highway sce-
narios”. In: 2018 Chinese Control And Decision Conference (CCDC).
IEEE. 2018.
[34] Dejan Mitrovic. “Machine learning for car navigation”. In: Engineering
of Intelligent Systems. Springer, 2001, pp. 670–675.
[35] Hermes J Mora and Esteban J Pino. “Simplified Prediction Method for
Detecting the Emergency Braking Intention Using EEG and a CNN
Trained with a 2D Matrices Tensor Arrangement”. In: International
Journal of Human–Computer Interaction (2022), pp. 1–14.
[38] Oluwatobi Olabiyi, Eric Martinson, Vijay Chintalapudi, and Rui Guo.
“Driver Action Prediction Using Deep (Bidirectional) Recurrent Neural
Network”. In: arXiv preprint arXiv:1706.02257 (2017).
[39] Jane Oruh, Serestina Viriri, and Adekanmi Adegun. “Long Short-Term
Memory Recurrent Neural Network for Automatic Speech Recogni-
tion”. In: IEEE Access 10 (2022), pp. 30069–30079.
[40] Sajan Patel, Brent Griffin, Kristofer Kusano, and Jason J Corso. “Pre-
dicting Future Lane Changes of Other Highway Vehicles using RNN-
based Deep Models”. In: arXiv preprint arXiv:1801.04340 (2018).
[41] Jinshuan Peng, Yingshi Guo, Rui Fu, Wei Yuan, and Chang Wang.
“Multi-parameter prediction of drivers’ lane-changing behaviour with
neural network model”. In: Applied ergonomics 50 (2015), pp. 207–217.
[43] Oliver Scheel, Loren Schwarz, Nassir Navab, and Federico Tombari.
“Situation Assessment for Planning Lane Changes: Combining Recur-
rent Models and Prediction”. In: arXiv preprint arXiv:1805.06776 (2018).
[44] Joerg Schneider, Andreas Wilde, and Karl Naab. “Probabilistic ap-
proach for modeling and identifying driving situations”. In: Intelligent
Vehicles Symposium, 2008 IEEE. IEEE. 2008, pp. 343–348.
[46] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to sequence
learning with neural networks”. In: Advances in neural information pro-
cessing systems 27 (2014).
[47] Kenji Takagi, Haruki Kawanaka, Md. Shoaib Bhuiyan, and Koji Oguri.
“Estimation of a three-dimensional gaze point and the gaze target from
the road images”. In: Intelligent Transportation Systems (ITSC), 14th
International IEEE Conference on. IEEE. 2011, pp. 526–531.
[48] Shigeki Tezuka, Hitoshi Soma, and Katsuya Tanifuji. “A study of driver
behavior inference model at time of lane change using Bayesian net-
works”. In: Industrial Technology, 2006. ICIT 2006. IEEE International
Conference on. IEEE. 2006, pp. 2308–2313.
[49] Duy Tran, Weihua Sheng, Li Liu, and Meiqin Liu. “A Hidden Markov
Model based driver intention prediction system”. In: Cyber Technology
in Automation, Control, and Intelligent Systems (CYBER), 2015 IEEE
International Conference on. IEEE. 2015, pp. 115–120.
[50] Shaobo Wang, Pan Zhao, Biao Yu, Weixin Huang, and Huawei Liang.
“Vehicle trajectory prediction by knowledge-driven lstm network in
urban environments”. In: Journal of Advanced Transportation 2020
(2020).
[51] Cheng Wei, Fei Hui, and Asad J Khattak. “Driver lane-changing behav-
ior prediction based on deep learning”. In: Journal of advanced trans-
portation 2021 (2021).
[52] Jürgen Wiest, Matthias Karg, Felix Kunz, Stephan Reuter, Ulrich Kreßel,
and Klaus Dietmayer. “A probabilistic maneuver prediction framework
74
[55] S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Real-time driving
manoeuvre prediction using IO-HMM and driver cephalo-ocular be-
haviour”. In: Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE.
2017, pp. 875–880.
[56] Jianwei Zhang and Bernd Roessler. “Situation analysis and adaptive
risk assessment for intersection safety systems in advanced assisted driv-
ing”. In: Autonome Mobile Systeme 2009. Springer, 2009, pp. 249–258.
[57] Alex Zyner, Stewart Worrall, and Eduardo Nebot. “A Recurrent Neu-
ral Network Solution for Predicting Driver Intention at Unsignalized
Intersections”. In: IEEE Robotics and Automation Letters 3.3 (2018),
pp. 1759–1764.
[58] Alex Zyner, Stewart Worrall, James Ward, and Eduardo Nebot. “Long
short term memory for driver intent prediction”. In: 2017 IEEE Intel-
ligent Vehicles Symposium (IV). IEEE. 2017, pp. 1484–1489.
75
Chapter 3
3.1 Introduction
Advanced Driver Assistance Systems (ADASs) have attracted the attention
of many researchers and vehicle manufacturers for several decades. Achiev-
ing higher performance levels for ADAS requires a robust perception of the
driving environment. Hence, vision-based traffic scene perception which refers
to the identification of the position of traffic objects such as pedestrians, ve-
hicles, traffic signs, etc is of great importance in designing a modern ADAS.
However, in practice, many traffic scene issues, such as occlusions, weather
conditions, shadows and distant object identification affect the performance
of such systems. Improving the accuracy and adaptability of such methods is
still a challenging area of research [89]. In this study, we focus on four essen-
tial categories of objects: traffic signs, vehicles, traffic lights, and pedestrians.
Correctly detecting and localizing these classes of objects in the context of
ADAS is still a difficult challenge. Typically, problems encountered include
variations in viewpoints, object shape, size, color, distance from sensors, illu-
mination conditions, and object occlusion [4], [20], [25].
Our contributions include: collecting and labelling a large dataset includ-
ing images of different objects, and proposing an integrated framework to de-
77
tect and recognize traffic objects including traffic signs, vehicles, traffic lights,
and pedestrians. Our model inherits the advantage of deep neural networks
(ResNet and Faster R-CNN) and classical machine learning models (multi-
scale HOG-SVM). This framework is the first one of its kind which performs
its tasks taking the attentional visual field of the driver into consideration.
This is an important aspect of an ADAS, as it allows the ADAS to identify
objects seen and not seen by the driver, among other things.
This contribution is organized as follows: In Section 3.2, we review the
related literature. Section 3.3 describes the datasets we used and the proposed
method. Section 3.4 presents the experimental results obtained along with a
critical analysis. Conclusions and future research directions are described in
Section 3.5.
Generic object detection algorithms can be divided into two major types of
traditional and deep learning-based methods. In this section, we briefly review
these generic object detection methods. Several object detection surveys can
be found in [113], [114], [90], [118], [98] and [30].
Among the traditional object detectors we find the framework proposed
by Viola and Jones which employs searches based on sliding-windows and Ad-
aBoost classifiers [95]. Another popularly used framework is the linear Support
Vector Machine (SVM) classifier with such features as Histograms of Oriented
Gradients (HOG), Scale Invariant Feature Transforms (SIFT), and Local Bi-
nary Patterns (LBP). For example, in [53] and [22], researchers employed SVM
78
and a multi-scale detection framework with HOG features to detect birds and
pedestrians respectively. Finally, Aggregated Channel Features (ACF) is as
another successful detection framework that has been proposed by [21]. This
method also uses sliding-window searches and AdaBoost to detect objects in
a multi-scale fashion [70], [64].
Unlike traditional object detection algorithms that benefit from prior knowl-
edge, deep learning-based object detection methods attempt to learn high-level
features from a massive amount of data. As a result, they are less sensitive to
illumination changes, deformations and geometric transformations [86]. There
are two major types of deep learning-based object detection methods: Region-
based methods and regression-based methods. The former generates region
proposals at first and then classifies them into different object categories while
the latter transforms the object detection problem into a regression problem
and predicts locations and class probabilities directly from the whole image
[113]. The region-based methods mainly include R-CNN [30], Fast R-CNN
[29], Faster R-CNN [78], R-FCN [16], SPP-net [38] and Mask R-CNN [36]. On
the other hand, the regression-based methods mainly include AttentionNet
[107], G-CNN [67], SSD [62], YOLO [75], YOLOv2 [76], YOLOv3 [77], DSOD
[82] and DSSD [27].
gions [18], [54]. These methods are generally sensitive to variations in illumi-
nation and the distance to traffic signs [79]. Traffic signs also have specific
shapes that can be searched for by shape-based methods. The Hough trans-
form is one of the most common shape-based methods [68], [106], as it is
relatively robust against illumination change and image noise. Similarity de-
tection [80] and Distance Transform matching [28] also constitute shape-based
methods. Hybrid approaches take advantage of both sign color and shape [19],
[74]. Classification stages mostly employ template matching [91], [33], SVM
[103], [110], Genetic Algorithm (GA) [50], Artificial Neural Network (ANN)
[35], [39], AdaBoost [11], [60] and deep learning-based methods. In recent
years, deep learning methods have increasingly attracted a great deal of at-
tention. Convolutional Neural Networks (CNNs) constitute a subset of deep
neural network models that have the power to learn robust and discriminative
features from raw data. There is a variety of CNN that have been employed for
traffic sign recognition such as small-scale CNN [117], multi-scale CNN [81],
a committee of CNN [15], multi-column CNN [14], and multi-task CNN [52],
CNN-SVM [58], [55], among others. A number of traffic sign datasets have
been created in the past decade. However, methods that have been proposed
in the literature are mostly based on European datasets. As Traffic signs in
North America differ in color and shape, the methods that have been pro-
posed based on European traffic signs are not directly suitable in the North
American context [108].
regards to HG, there are various methods that can be divided into three basic
categorizes including knowledge-based, stereo-based, and motion-based [87].
Knowledge-based methods use prior knowledge including shadows [93], sym-
metry [49], edges [66], color [104], texture [8], corners [5] and vehicle lights
[10]. Stereo-based approaches usually exploit the Inverse Perspective Map-
ping (IPM) [6] or disparity maps [26] to localize vehicles, while motion-based
methods detect vehicles with optical flow [63]. HV approaches can be clas-
sified into two major categories [87]: template-based and appearance-based.
The former employs predefined vehicle patterns and estimates the correlation
between templates and candidate image regions [34], while the latter uses ma-
chine learning methods such as SVM [92], ANN [31], and AdaBoost [85] to
classify hypotheses into vehicle and non-vehicle categories.
Classifiers such as SVM [92], ANN [31], and AdaBoost [85] learn the char-
acteristics of vehicle appearance to draw a decision boundary between vehicle
and non-vehicle classes. In HV, a number of local feature descriptors such
as HOG [105], PHOG [49], Haar-like [100], Gabor [65], and SURF [59] have
shown a remarkable ability in collecting contextual information. Addition-
ally, different vehicle detection approaches that employ deep learning-based
methods discussed in Section 3.2.1 have been proposed. For instance, in [23],
the authors provided a comparative study on the performance of AlexNet and
Faster R-CNN models. Also, in [116], the authors exploited the fine-tuned
YOLO [75] for vehicle detection. In [42], vehicles are detected with a simpli-
fied Fast R-CNN.
81
Many traditional methods for pedestrian detection have been proposed with
the majority of them using features such as HOG [17], Haar-Like [72], Viola-
Jones [94], and LBP [97], followed by a classification stage using either SVM,
ANN, or AdaBoost. Additionally, pedestrian detection methods using deep
learning can be categorized as either single-stage or two-stage techniques.
RPN+BF [111], and Faster R-CNN [112] are examples where the authors em-
ployed a two-stage approach. Examples of single-stage approaches have been
proposed: For instance, Lan et al. [56] modified YOLOv2 into a single-stage
network called YOLO-R for pedestrian detection. Comprehensive surveys on
pedestrian detection are provided in [7] and [1].
Color segmentation is a method often used to reduce the search space in traf-
fic scene images. For example, in [12] and [9], the authors employed HSI and
YCbCr color spaces respectively to detect traffic lights. In some studies, a
shape-based method such as the circular Hough transform [71] was used af-
ter color segmentation to find round traffic lights. Blob detection is another
approach to detect traffic lights that analyses the size and aspect ratio of the
traffic lights to eliminate regions likely to produce false positives [115]. In
[47], saliency maps are employed to detect traffic lights. In [48], GPS data
and digital maps are used to identify traffic lights in urban areas. Feature
descriptors such as HOG [12], Haar-like [32], and Gabor Wavelets [9] have
been extensively used to detect traffic lights. To recognize the state of traffic
lights, several methods have been employed mostly including SVM [83], fuzzy
algorithms [2] and more recently, deep learning methods. A simple CNN was
82
used by Lee and Park [57] for traffic light classification. Behrendt et al. [4]
applied YOLOv1 for detection and classification. In [45], YOLO-9000 [76]
was applied to the LISA traffic light dataset. The authors in [99] exploited
DeepTLR networks for real-time traffic light detection and classification. A
novel Faster R-CNN hierarchical architecture was proposed in [73] and trained
on a joint traffic light and sign dataset.
Prior to our work, there has been a very little attention in previous research
for simultaneously detecting different major classes of traffic objects. Hence,
one aspect which makes our work different from others is the fact that in
addition to detection of more major classes of traffic objects, we also classify
them into their own subcategories.
In this section, we describe our proposed method for traffic object detection
and recognition based on the attentional visual field of the driver. First, our
dataset used in this research is introduced. Following this, we describe the
method employed to find the attentional gaze area of the driver in the forward
stereo imaging system. Next, in the object detection stage, our trained models
and the methods used for enriching our data set are described. We then discuss
the Region of Interests (ROIs) integration method we used. Finally, the object
recognition stage is presented. Figure 3.1 illustrates our proposed framework.
own object dataset from the RoadLAB experimental data sequences [3], [51],
[84]. As one of our contributions in this study, in order to train, validate and
test our models, we collected 13,546 sample images to detect and recognize
traffic objects including traffic signs, vehicles, pedestrians and traffic lights.
Our dataset contains 3,225 sample images for the background class in addition
to 5,172, 1,984, 1,290 and 1,875 sample images for the object classes of traffic
sign, vehicle, pedestrian and traffic light respectively. The vehicle class consists
of 3 distinct classes including car, bus and truck. The traffic light class consists
of 4 distinct classes including red, yellow, green and not clear. Finally, the
traffic sign class includes 19 distinct classes of traffic signs. Additionally, some
traffic sign classes include more than one sign type such as “Maximum Speed
Limit”, “Construction”, “Parking”, etc. Our samples for traffic signs can be
considered as a complete sign dataset including warning signs, regulatory signs,
direction signs, and temporary signs.
The visual attentional field of the driver consists of a circle in 3D space within
the plane that contains the Point of Gaze (PoG), perpendicular to the Line
of Gaze (LoG). The radius of the circle is determined by the angular opening
of the cone of visual attention as shown in Figure 3.2. The circle generally
is projected onto the imaging plane of the stereo sensor as a 2D ellipse. We
describe the procedure we employed, as per Kowsari et al. [51].
First, both the eye position e = (ex , ey , ez ) and the 3D PoG g = (gx , gy , gz )
are transformed into the reference frame of the forward stereo sensor. Next,
the radius of the circular attentional gaze area is obtained by computing the
85
Figure 3.2: (top): Depiction of the driver attentional gaze cone. (bottom):
Re-projection of the 3D attentional circle into the corresponding 2D ellipse on
image plane of the forward stereo scene system.
Figure 3.3: Examples of attentional gaze areas projected onto the forward stereo
sensor of the vehicle.
To detect traffic objects of interest inside and outside of the attentional field
of the driver, we employed a framework consisting of two different model types
that we proceed to describe:
Model A
The first model consists of two steps that include a multi-scale HOG-SVM
followed by the use of a ResNet101 network. The multi-scale HOG-SVM de-
scriptor counts occurrences of gradient orientations in an image region followed
by a block-normalization algorithm that results in better invariance to edge
contrast and shadows. Since it operates on local cells, it is also relatively
invariant to geometric and photometric transformations. In general, the de-
tection algorithm is based on an overlapping sliding window approach. Since
the Region of Interest (ROI) contains objects that vary in size, we used a
multi-scale method for the object detection problem. We treat the HOG fea-
87
tures extracted from each sliding window at each level as independent samples
prior to feeding them to the SVM classifier. Figure 3.4 illustrates the internal
view of multi-scale HOG-SVM.
We trained four independent multi-scale HOG-SVM models to find ROIs,
for our four types of traffic objects (signs, vehicles, pedestrians, and traffic
lights). The model moves a sliding window across the images and HOG fea-
tures are extracted. The model follows this strategy at several imaging scales.
Typically, SVM outputs conventional binary decision labels. However, it can
also provide a probabilistic confidence score [61] for each sliding window, which
we use to threshold on ROIs. With the use of HOG-SVM, we discard the ROIs
labelled as background while other candidates are transferred to the next stage
of processing.
The remaining ROIs from the HOG-SVM classifier were categorized into
five classes: background, traffic sign, vehicle, pedestrian and traffic light. In
the second stage we applied ResNet101 [37], which is a popular CNN that
has been already trained with more than a million images from the ImageNet
88
database [69]. Figure 3.5 illustrates sample results obtained with this model.
However, during our empirical trials, we noted the multi-scale HOG-SVM had
difficulty localizing vehicles occupying a large part of the image (Figure 3.6
illustrates this problem). Hence, we also used a Faster R-CNN model to detect
vehicles.
Model B
mislabeled objects in the preparation of training set for the next iteration. We
finally provided the models with additional key samples which made them
more robust.
After completing the detection stage on test images, in order to improve the
detection performance, we eliminated redundant detections and merged the
remaining ones into a set of integrated results. For this, we used a method
that is based on Non Maximum Suppression (NMS) [108], [44]. When mul-
tiple bounding boxes overlap, NMS retains the highest-scored bounding box
and eliminates any other whose overlap ratio exceeds a preset threshold. We
employed Pascal’s overlap score [24] to find the overlap ratio a0 between them.
This ratio is obtained as:
area(B1 ∩ B2 )
a0 = (3.3)
area(B1 ∪ B2 )
91
The output of the detection stage is a list of candidate objects that have been
labeled with the class they belong to (traffic sign, vehicle, traffic light, and
pedestrian). Except for pedestrian objects, the remaining objects from the
list are considered for further analysis at this stage. We separately trained
three independent models on traffic signs, vehicles, and traffic lights by using
ResNet101 for recognizing the remaining objects. After feeding the candidate
object (hypothesis) into its corresponding model, the classifier decides whether
the object in the list is either a rejected object or a recognized object and,
in this case, the classifier responds with the appropriate class name. More
precisely, the traffic light recognizer is able to classify traffic light hypotheses
92
Figure 3.8: Output samples from the proposed framework superimposed on the
attentional visual field of the driver
into five classes, the vehicle recognizer is able to classify vehicle hypotheses into
four classes, while the traffic sign recognizer classifies traffic sign hypotheses
into twenty classes. Fig 3.8 shows a sample of results from the proposed
framework for four classes of traffic objects.
stages in detail.
3.4.1 Parameters
We applied a threshold to the score values that each SVM model provided,
and ROIs were considered for post-processing only if their SVM score was
higher than the threshold value. These score values ranged from 0 (definitely
negative) to 1 (definitely positive). We selected the threshold that allowed a
maximum of true positives. While some false positives passed this stage, they
could mostly be eliminated in the following stage of processing.
Threshold values of 0.50, 0.40, 0.40, and 0.60 were applied to the SVM
models for detection of traffic signs, pedestrians, traffic lights, and vehicles
respectively. These values provided the best results. We also utilized different
augmentation methods to improve the performance of our models. Table 3.1
lists the methods we have used to augment our data.
94
In the following Subsections, we discuss the results we obtained for the object
detection stage in detail.
To verify the accuracy of the object detection stage, we report the Detection
Rate (DR) and the number of False Positives Per Frame (FPPF), defined as
follows:
95
Figure 3.9: Confusion matrix from trained ResNet101 for labelling of traffic
object classes.
TP
DR = (3.4)
TP+FN
FP
FPPF = (3.5)
F
where TP is the number of correctly detected objects, FN is the number of
objects that are wrongly not detected, FP is the number of incorrectly detected
objects, and F is the total number of frames.
Moreover, Table 3.2 includes F1-scores for different traffic objects. As can
be seen, our model achieved 0.91, 0.90 and 0.06 for DR, F1-score and FPPF
respectively. Previously, Zabihi et al. [108] detected traffic signs only from the
RoadLAB dataset and reported 0.84 for DR and 0.04 for FPPF (last row of
Table 3.2). Their model was based on traditional machine learning methods.
They employed a linear SVM as a classifier and a HOG as traffic sign features
for the detection stage. Our model for traffic sign detection, when compared
96
with the work from Zabihi et al. [108], has reached 0.07 more accuracy for DR
and shows an increase in FPPF of 0.02. Recent studies have compared several
recent object detection models including Faster R-CNN [78], Fast R-CNN [29],
YOLO [75], and YOLOv3 [77]. Faster R-CNN has a better performance than
R-CNN and Fast R-CNN. However, as mentioned, Faster R-CNN struggles
with objects that are small in size. According to [75] and [77], YOLO struggles
with small objects as well and on the other hand, YOLOv3 struggles with
larger size objects. Using our framework, we were able to detect objects of
different sizes. Figure 3.10 illustrates the performance of our detector using a
Receiver Operating Characteristics (ROC) curve, comparing the True Positive
Rate (TPR) to the False Positive Rate (FPR). In figure 3.10, Class1, Class2,
Class3, and Class4 represent object classes for pedestrians, traffic signs, traffic
lights and vehicles, respectively.
neural networks. To analyze the trust of the network, they use the concept
of trust spectrum to investigate the overall trust across both correctly and
incorrectly answered questions [102]. The trust spectrum provides valuable
information when trust can break down. The trust spectrum in Figure 3.11
illustrates the overall trust for the four classes, including pedestrian, traffic
sign, traffic light, and vehicle. As can be seen, the vehicle class achieved the
highest trust while pedestrian class obtained the lowest reliability.
The object recognition stage is applied to the output of the object detection
stage to recognize hypotheses and to provide a classification result. We trained
three separate ResNet101 models for classes corresponding to traffic signs,
traffic lights, and vehicles using our training dataset. To verify the accuracy
of the object recognition stage, we computed the confusion matrix for each
class, as displayed in Figures 3.12, 3.13 and 3.14.
99
Figure 3.12: Confusion matrix from trained ResNet101 for traffic sign recog-
nition.
100
Figure 3.13: Confusion matrix from trained ResNet101 for traffic light recog-
nition.
Figure 3.14: Confusion matrix from trained ResNet101 for vehicle recognition.
101
Results for traffic sign recognition (Fig 3.12) show that the model reached
96.1% accuracy with our Canadian traffic sign dataset. The largest values
along the main diagonal indicate that the majority of the test sign images
were classified correctly. The lowest correct response of 83.3% was obtained
for the class PedestrianCrossover.
Fig 3.13 illustrates the confusion matrix for traffic light recognition. The
results show that the model has reached 96.2% of overall correct classification.
As can be seen, the lowest degree of correctly categorized classes belongs to
class NotClear while classes Green and Red obtained 98.8% and 99.2% respec-
tively.
The results shown in Figure 3.14 indicate that the vehicle recognizer model
achieved 94.8% of overall correct classification. This confusion matrix shows
that this model is able to discriminate vehicle objects (i.e. vehicle, bus, and
truck) with less than 3% of mislabeling error. The background class achieved
the least accuracy with 87.3%.
3.5 Conclusion
We conducted a literature review of detection and recognition approaches for
four important classes of traffic objects including traffic signs, vehicles, pedes-
trians and traffic lights. Generally, the availability of suitable and adequate
training data is a vital element in the learning process in order to achieve a
discriminative model. In this work, we collected over 10,000 object sample im-
ages from sequences belonging to the RoadLAB initiative [3]. We also enriched
our training data using augmentation and a HEM strategy. We localized the
attentional visual area of the driver onto the imaging plane of the forward
stereoscopic system, and a framework for the detection and recognition of
102
traffic objects located inside and outside the attentional visual field of drivers
was devised. This information helps an ADAS to infer the object seen by the
driver when the existing object has fallen inside the driver gaze area. We con-
sidered 3, 4, and 19 different classes for vehicles, traffic lights, and traffic signs
respectively. The object detection stage was built from a combination of both
traditional and deep learning-based models to detect objects at various scales.
Finally, in the recognition stage, by means of trained ResNet101 networks, our
framework achieved 96.1%, 96.2% and 94.8% of correct classification for traffic
signs, traffic lights, and vehicles respectively.
103
Bibliography
[9] Z. Cai, Y. Li, and M. Gu. “Real-time recognition system of traffic light
in urban environment”. In: 2012 IEEE Symposium on Computational
Intelligence for Security and Defence Applications. IEEE. 2012, pp. 1–
6.
[11] Long Chen, Qingquan Li, Ming Li, and Qingzhou Mao. “Traffic sign
detection and recognition for intelligent vehicle”. In: IEEE Intelligent
Vehicles Symposium (IV). 2011, pp. 908–913.
[12] Q. Chen, Z. Shi, and Z. Zou. “Robust and real-time traffic light recog-
nition based on hierarchical vision architecture”. In: 2014 7th Interna-
tional Congress on Image and Signal Processing. IEEE. 2014, pp. 114–
119.
105
[13] Mingxi Cheng, Shahin Nazarian, and Paul Bogdan. “There is hope after
all: Quantifying opinion and trustworthiness in neural networks”. In:
Frontiers in Artificial Intelligence 3 (2020), p. 54.
[14] Dan CireAan, Ueli Meier, Jonathan Masci, and Jürgen Schmidhuber.
“Multi-column deep neural network for traffic sign classification”. In:
Neural networks 32 (2012), pp. 333–338.
[15] Dan Cireşan, Ueli Meier, Jonathan Masci, and Jürgen Schmidhuber.
“A committee of neural networks for traffic sign classification”. In: The
2011 international joint conference on neural networks. IEEE. 2011,
pp. 1918–1921.
[16] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. “R-fcn: Object detec-
tion via region-based fully convolutional networks”. In: arXiv preprint
arXiv:1605.06409 (2016).
[17] Navneet Dalal and Bill Triggs. “Histograms of oriented gradients for hu-
man detection”. In: IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR). Vol. 1. 2005, pp. 886–893.
[22] Jan Dürre, Dario Paradzik, and Holger Blume. “A HOG-based real-time
and multi-scale pedestrian detector demonstration system on FPGA”.
In: Proceedings of the 2018 ACM/SIGDA International Symposium on
Field-Programmable Gate Arrays. 2018, pp. 163–172.
[23] J.E. Espinosa, S.A. Velastin, and J.W. Branch. “Vehicle detection using
alex net and faster R-CNN deep learning models: a comparative study”.
In: International Visual Informatics Conference. Springer. 2017, pp. 3–
15.
[26] U. Franke and I. Kutzbach. “Fast stereo based object detection for
stop&go traffic”. In: Proceedings of Conference on Intelligent Vehicles.
IEEE. 1996, pp. 339–344.
[27] C. Fu, W. Liu, A. Ranga, A. Tyagi, and A.C. Berg. “Dssd: Decon-
volutional single shot detector”. In: arXiv preprint arXiv:1701.06659
(2017).
107
[35] M.A. Hannan, A. Hussain, S.A. Samad, and K.A. Ishak. “A unified
robust algorithm for detection of human and non-human object in in-
telligent safety application”. In: International Journal of Computer and
Information Engineering 2.11 (2008).
108
[36] K. He, G. Gkioxari, P. Dollár, and R. Girshick. “Mask r-cnn”. In: Pro-
ceedings of the IEEE international conference on computer vision. 2017,
pp. 2961–2969.
[37] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image
recognition”. In: Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016, pp. 770–778.
[38] K. He, X. Zhang, S. Ren, and J. Sun. “Spatial pyramid pooling in deep
convolutional networks for visual recognition”. In: IEEE transactions
on pattern analysis and machine intelligence 37.9 (2015), pp. 1904–
1916.
[40] T. Hoang Ngan Le, Y. Zheng, C. Zhu, K. Luu, and M. Savvides. “Mul-
tiple scale faster-rcnn approach to driver’s cell-phone usage and hands
on steering wheel detection”. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Workshops. 2016, pp. 46–
53.
[43] G.X. Hu, Z. Yang, L. Hu, L. Huang, and J.M. Han. “Small Object De-
tection with Multiscale Features”. In: International Journal of Digital
Multimedia Broadcasting 2018 (2018).
[46] H. Ji, Z. Gao, T. Mei, and Y. Li. “Improved Faster R-CNN With Mul-
tiscale Feature Fusion and Homography Augmentation for Vehicle De-
tection in Remote Sensing Images”. In: IEEE Geoscience and Remote
Sensing Letters (2019).
[48] V. John, K. Yoneda, B. Qi, Z. Liu, and S. Mita. “Traffic light recognition
in varying illumination using deep learning and saliency map”. In: 17th
International IEEE Conference on Intelligent Transportation Systems
(ITSC). IEEE. 2014, pp. 2286–2291.
110
[49] N. Khairdoost, S.A. Monadjemi, and K. Jamshidi. “Front and rear ve-
hicle detection using hypothesis generation and verification”. In: Signal
& Image Processing 4.4 (2013), p. 31.
[52] A.D. Kumar. “Novel deep learning model for traffic sign detection using
capsule networks”. In: arXiv preprint arXiv:1805.04424 (2018).
[54] W. Kuo and C. Lin. “Two-stage road sign detection and recognition”.
In: 2007 IEEE international conference on multimedia and expo. IEEE.
2007, pp. 1427–1430.
[55] Y. Lai, N. Wang, Y. Yang, and L. Lin. “Traffic Signs Recognition and
Classification based on Deep Feature Learning.” In: ICPRAM. 2018,
pp. 622–629.
111
[57] G. Lee and B.K. Park. “Traffic light recognition using deep neural net-
works”. In: 2017 IEEE international conference on consumer electron-
ics (ICCE). IEEE. 2017, pp. 277–278.
[58] K. Lim, Y. Hong, Y. Choi, and H. Byun. “Real-time traffic sign recog-
nition based on a general purpose GPU and deep-learning”. In: PLoS
one 12.3 (2017), e0173317.
[60] C. Lin and M. Wang. “Road sign recognition with fuzzy adaptive pre-
processing models”. In: Sensors 12.5 (2012), pp. 6415–6433.
[61] H. Lin. SVM. http : / / www . work . caltech . edu / ~htlin / program /
libsvm/.
[63] Y. Liu, Y. Lu, Q. Shi, and J. Ding. “Optical flow based urban road
vehicle tracking”. In: 2013 Ninth International Conference on Compu-
tational Intelligence and Security. IEEE. 2013, pp. 391–395.
112
[64] Markus Mathias, Radu Timofte, Rodrigo Benenson, and Luc Van Gool.
“Traffic sign recognition—How far are we from the solution?” In: The
International Joint Conference on Neural Networks (IJCNN). 2013,
pp. 1–8.
[65] Q. Ming and K. Jo. “Vehicle detection using tail light segmentation”. In:
Proceedings of 2011 6th International Forum on Strategic Technology.
Vol. 2. IEEE. 2011, pp. 729–732.
[66] K. Mu, F. Hui, X. Zhao, and C. Prehofer. “Multiscale edge fusion for
vehicle detection based on difference of Gaussian”. In: Optik 127.11
(2016), pp. 4794–4798.
[67] M. Najibi, M. Rastegari, and L.S. Davis. “G-cnn: an iterative grid based
object detector”. In: Proceedings of the IEEE conference on computer
vision and pattern recognition. 2016, pp. 2369–2377.
[70] E. Ohn-Bar and M. Trivedi. “Fast and robust object detection using
visual subcategories”. In: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition Workshops. 2014, pp. 179–184.
[71] M. Omachi and S. Omachi. “Traffic light detection with color and edge
information”. In: 2009 2nd IEEE International Conference on Com-
puter Science and Information Technology. IEEE. 2009, pp. 284–287.
113
[78] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster r-cnn:
Towards real-time object detection with region proposal networks”. In:
arXiv preprint arXiv:1506.01497 (2015).
114
[79] Y. Saadna and A. Behloul. “An overview of traffic sign detection and
classification methods”. In: International Journal of Multimedia Infor-
mation Retrieval 6.3 (2017), pp. 193–210.
[82] Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen,
and Xiangyang Xue. “Dsod: Learning deeply supervised object detec-
tors from scratch”. In: Proceedings of the IEEE international conference
on computer vision. 2017, pp. 1919–1927.
[83] Z. Shi, Z. Zou, and C. Zhang. “Real-time traffic light detection with
adaptive background suppression filter”. In: IEEE Transactions on In-
telligent Transportation Systems 17.3 (2015), pp. 690–700.
[88] Kenji Takagi, Haruki Kawanaka, Md. Shoaib Bhuiyan, and Koji Oguri.
“Estimation of a three-dimensional gaze point and the gaze target from
the road images”. In: Intelligent Transportation Systems (ITSC), 14th
International IEEE Conference on. IEEE. 2011, pp. 526–531.
[89] B. Tian, B.T. Morris, M. Tang, Y. Liu, Y. Yao, C. Gou, D. Shen, and
S. Tang. “Hierarchical and Networked Vehicle Surveillance in ITS: A
Survey”. In: IEEE Transactions on Intelligent Transportation Systems
18.1 (2017), p. 25.
[93] C. Tzomakas and W. von Seelen. “Vehicle detection in traffic scenes us-
ing shadows”. In: Ir-Ini, Institut fur Nueroinformatik, Ruhr-Universitat.
Citeseer. 1998.
116
[94] P. Viola, M.J. Jones, and D. Snow. “Detecting pedestrians using pat-
terns of motion and appearance”. In: International Journal of Com-
puter Vision 63.2 (2005), pp. 153–161.
[95] Paul Viola and Michael J Jones. “Robust real-time face detection”. In:
International journal of computer vision 57.2 (2004), pp. 137–154.
[97] X. Wang, T. Han, and S. Yan. “An HOG-LBP human detector with
partial occlusion handling”. In: 2009 IEEE 12th international confer-
ence on computer vision. IEEE. 2009, pp. 32–39.
[98] X. Wang, M. Yang, S. Zhu, and Y. Lin. “Regionlets for generic ob-
ject detection”. In: Proceedings of the IEEE international conference
on computer vision. 2013, pp. 17–24.
[99] M. Weber, P. Wolf, and J.M. Zöllner. “DeepTLR: A single deep con-
volutional network for detection and classification of traffic lights”. In:
2016 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2016, pp. 342–
348.
[100] X. Wen, L. Shao, W. Fang, and Y. Xue. “Efficient feature selection and
classification for vehicle detection”. In: IEEE Transactions on Circuits
and Systems for Video Technology 25.3 (2014), pp. 508–517.
[102] Alexander Wong, Xiao Yu Wang, and Andrew Hryniowski. “How Much
Can We Really Trust You? Towards Simple, Interpretable Trust Quan-
tification Metrics for Deep Neural Networks”. In: arXiv preprint arXiv:2009.05835
(2020).
[105] G. Yan, M. Yu, Y. Yu, and L. Fan. “Real-time vehicle detection using
histograms of oriented gradients and AdaBoost classification”. In: Optik
127.19 (2016), pp. 7941–7951.
[106] S. Yin, P. Ouyang, L. Liu, Y. Guo, and S. Wei. “Fast traffic sign recogni-
tion with a rotation invariant binary pattern based feature”. In: Sensors
15.1 (2015), pp. 2161–2180.
[107] Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony S Paek, and
In So Kweon. “Attentionnet: Aggregating weak directions for accurate
object detection”. In: Proceedings of the IEEE International Conference
on Computer Vision. 2015, pp. 2659–2667.
[108] S.J. Zabihi, S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Detec-
tion and recognition of traffic signs inside the attentional visual field
of drivers”. In: 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE.
2017, pp. 583–588.
118
[111] L. Zhang, L. Lin, X. Liang, and K. He. “Is faster r-cnn doing well
for pedestrian detection?” In: European conference on computer vision.
Springer. 2016, pp. 443–457.
[112] Xiaotong Zhao, Wei Li, Yifang Zhang, T Aaron Gulliver, Shuo Chang,
and Zhiyong Feng. “A faster RCNN-based pedestrian detection sys-
tem”. In: 2016 IEEE 84th Vehicular Technology Conference (VTC-
Fall). IEEE. 2016, pp. 1–5.
[113] Z. Zhao, P. Zheng, S. Xu, and X. Wu. “Object detection with deep
learning: A review”. In: IEEE transactions on neural networks and
learning systems (2019).
Chapter 4
the RoadLAB instrumented vehicle. Our experimental results show that our
approach achieved promising results in the detection stage with an accuracy
of 94.52% in the lane classification stage.
4.1 Introduction
Nowadays, almost every new vehicle features some type of Advanced Driving
Assistance System (ADAS), ranging from adaptive cruise control, blind-spot
detection, collision avoidance, traffic sign detection, overtaking assistance, to
parking assistance. ADASs generally increase safety and reduce driver work-
load. Lane detection constitutes one of the fundamental functions found in
autonomous driving systems and ADASs. Lane boundaries provide the infor-
mation required for estimating the lateral position of a vehicle on the road,
enabling systems such as lane departure warning, overtaking assistance, intel-
ligent cruise control, and trajectory planning.
Lane detection approaches are categorized into two groups: classical and
deep learning methods. The traditional lane detection methods usually employ
a number of computer vision and image processing techniques to extract spe-
cialized features and to identify the location of lane segments. Subsequently,
post-processing techniques remove false detections and join sub-segments to
obtain final road lane positions. In general, these traditional approaches suffer
from performance issues when they encounter challenging illumination condi-
tions and complex road scenes.
Recently, deep learning-based methods have been employed to provide re-
liable solutions to the lane detection problem. Methods based on CNNs fall
into two categories, namely segmentation-based methods and Generative Ad-
versarial Network based methods (GAN) [26]. Chougule et al. [6] proposed
122
In this section, we survey both traditional and deep learning methods for lane
marking recognition and classification.
There are mainly two groups of segmentation methods for lane marker detec-
tion: 1) Semantic Segmentation and 2) Instance Segmentation. In the first
group, each pixel is classified by a binary label indicating whether it belongs
to a lane or not. For instance, in [9], the authors presented a CNN-based
framework that utilizes front-view and top-view image regions to detect lanes.
Following this, they used a global optimization step to reach a combination
of accurate lane lines. Lee et al. [14] proposed a Vanishing Point Guided Net
(VPGNet) model that simultaneously performs lane detection and road mark-
ing recognition under different weather conditions. Their data was captured
in a downtown area of Seoul, South Korea.
GANs have been used for lane detection. Liu et al. [17] presented a style-
transfer-based data enhancement approach, which used GANs [8] to create
images in low-light conditions that raise the environmental adaptability of the
model. Their method does not require additional annotation nor extraneous
inference overhead. Ghafoorian et al. [7] proposed an Embedding Loss GAN
124
To identify the ego lane boundaries in the road image, a regression-based net-
work is utilized that outputs two vectors representing the coordinate points of
the left and right boundaries from the ego lane. Each coordinate vector con-
sists of 14 coordinates (x, y) on the image plane indicating sampled positions
for the ego lane boundary. To construct this model, a pre-trained AlexNet
126
architecture is utilized. First, the last two fully connected layers are removed
from the network and then four-level cascaded layers are added to the first
six layers of AlexNet to complete the lane detection model. These four-level
cascaded layers contain two branches of two back-to-back fully connected lay-
ers, a concatenation layer and a regression layer, as shown in Figure 4.1. This
branched architecture minimizes misclassifications of the detected lane points
[6]. Moreover, this architecture is capable of detecting the road boundary as an
assumptive ego lane left/right boundary when there is no actual lane marking.
In this section, we introduce our lane detection dataset extracted from the
driving sequences, captured with the RoadLAB instrumented vehicle [2], (see
Figure 4.2). Our experimental vehicle was used to collect driving sequences
from 16 drivers on a pre-determined 28.5km route within the city of London,
Ontario, Canada. (see Figure 4.3). Data frames were collected at a rate
of 30Hz with a resolution of 320 × 240. We used 12 driving sequences, as
described in Table 4.1, to derive our dataset containing 5782 images along
with their corresponding lane annotations. Figure 4.4 illustrates examples
from our derived dataset.
An essential element of any deep learning-based system is the availabil-
ity of large numbers of sample images. Data augmentation is a commonly
used strategy to significantly expand an existing dataset by generating unique
samples through transformations of images in the dataset. The exploitation
of data augmentation strategy reduces overfitting from the network. We em-
ployed data augmentation techniques to enrich the dataset, resulting in an
improved performance at the lane detection stage.
127
Figure 4.1: The lane detection model provides two lane vectors, each consisting
of 14 coordinates in the image plane that represent the predicted left and right
boundaries of the ego lane.
Figure 4.2: Forward stereoscopic vision system mounted on rooftop of the Road-
LAB experimental vehicle.
Figure 4.3: Map of the predetermined course for drivers, located in London,
Ontario, Canada. The path includes urban and suburban driving areas and is
approximately 28.5 kilometers long.
solid yellow, dashed-solid yellow, solid-dashed yellow, and road boundary. The
road boundary type specifies the edge of the road when an actual lane marking
does not exist.
The lane type classification stage receives the output of lane detection (14 co-
ordinates in the image plane for each predicted ego lane boundary) as input.
We first identify the ROI for each lane boundary separately. Each ROI fits
the detected ego lane boundary as per its corresponding predicted coordinates.
Next, we apply a projective transformation to each ROI to obtain an image
where the lane marking aligns in the center of the resulting image. Afterwards,
we crop the middle rectangular part of the transformed image that contains
the lane type information. Finally, we apply our trained ResNet101 network
to classify the resulting images obtained for each lane boundary into the afore-
mentioned eight classes. Figure 4.5 illustrates how the lane type classification
130
Figure 4.5: Visualization of the lane type classification stage, from a sample
road image to the ego lane boundaries.
In order to train and test our lane type classification model, we collected 10571
sample lane boundary images from the outputs of the lane detection model.
These samples are inputs to our ResNet101 model, as they contain the lane
type information. Figure 4.6 shows samples of our dataset for the eight lane
boundary types.
To further enrich our lane type dataset for training, we employed two dif-
131
Figure 4.6: Lane boundary samples of our train-and-test data a) Dashed White,
b) Dashed Yellow, c) Solid White, d) Solid Yellow, e) Double Solid Yellow f )
Dashed-Solid Yellow, g) Solid-Dashed Yellow, h) Road Boundary
132
To perform the experiments, we applied the model to the unseen test data
extracted from our driving sequences [2]. To evaluate the performance of the
lane detection stage, we used a metric suggested by [6]: we compute the mean
error between the predicted lane coordinates generated by the lane coordinate
model with the corresponding ground truth values as a Euclidean distance (in
terms of pixels), for each lane boundary. For each single lane boundary, the
Mean Prediction Error (MPE) is computed as follows (see Figure 4.7):
14
1 Xp
MPE = (xpi − xgi )2 + (ypi − ygi )2 (4.1)
14 i=1
where (xpi , ypi ) and (xgi , ygi ) indicate the predicted lane coordinates and
the corresponding ground truth coordinates respectively. Additionally, during
network training, we investigated the performance of the following two L1 and
L2 loss functions at the lane detection stage:
14
X 14
X
L1 = |xpi − xgi | + |ypi − ygi | (4.2)
i=1 i=1
133
14
X 14
X
L2 = (xpi − xgi )2 + (ypi − ygi )2 (4.3)
i=1 i=1
where the L1 loss computes the absolute differences between the predicted
and actual values while the L2 loss, also known as the Squared Error Loss,
computes the squared differences between the predicted and actual values.
In Table 4.3, we report the performance of the lane detection stage de-
scribed in Section 4.3.1 for the ego lane left/right boundaries using the afore-
mentioned loss functions. As observed from Table 4.3, the L1 loss function is
superior to L2.
Figure 4.7: Visualization of the Euclidean error between the predicted lane
coordinates and the corresponding ground truth coordinates.
our dataset to verify and categorize the localized lane boundaries into eight
classes of lane types. To verify the accuracy of the lane type classification
stage, we computed the confusion matrix from the ResNet101 model on the
test data (See Figure 4.8). The results show that the model reaches 94.52% of
overall correct classification. This model is able to discriminate the eight lane
types with less than 4.2% of mislabeling error. The lowest degree of correctly
categorized classes belongs to class dashed-solid yellow, while class double
solid yellow obtained 97.7%. As mentioned, the authors in [24] recognized
five lane marking types including dashed, dashed-solid, double solid, solid-
dashed, and single solid. They applied their model to three different test data
and obtained three corresponding confusion matrices with the overall correct
classification of 71.53%, 77.27% and 85.42% which are all less than our overall
correct classification. Figure 4.9 displays small portions of the visual outputs
from our system for the eight classes of lane boundary types.
Figure 4.8: Confusion matrix from ResNet101 for lane type classification.
135
4.5 Conclusions
In general, in the literature there are little works to classify lane types. In
this study, we presented a CNN-based framework to detect and classify lane
types in urban and suburban driving environments which have been also less
studied in comparison to highways. To perform lane detection and classifica-
tion stages, we created an image dataset for each from sequences captured in
different illumination conditions created by the RoadLAB initiative [2]. We
also enriched our training data using data augmentation and a hard example
mining strategy. To detect lanes, we used a network which generates lane
information in terms of image coordinates in an end-to-end way. In the lane
type classification stage, we utilized our trained ResNet101 network to cate-
gorize the detected lane boundaries into eight classes including dashed white,
dashed yellow, solid white, solid yellow, double solid yellow, dashed-solid yel-
low, solid-dashed yellow, and road boundary. Finally, our results showed that
the ResNet101 model achieved over 94% of correct lane type classifications,
which is higher than those of the previous work [24], in addition to the fact
that we can recognize three more classes of lane types, especially road bound-
ary type which can be taken into account in urban areas when there is no
actual lane marking.
136
Bibliography
[2] S.S. Beauchemin, M.A. Bauer, T. Kowsari, and J. Cho. “Portable and
Scalable Vision-Based Vehicular Instrumentation for the Analysis of
Driver Intentionality”. In: IEEE Transactions on Instrumentation and
Measurement 61.2 (2012), pp. 391–401.
[3] Hsu-Yung Cheng, Bor-Shenn Jeng, Pei-Ting Tseng, and Kuo-Chin Fan.
“Lane detection with moving vehicles in the traffic scenes”. In: IEEE
Transactions on intelligent transportation systems 7.4 (2006), pp. 571–
582.
[4] Kuo-Yu Chiu and Sheng-Fuu Lin. “Lane detection using color-based
segmentation”. In: IEEE Proceedings. Intelligent Vehicles Symposium,
2005. IEEE. 2005, pp. 706–711.
[5] Hyun-Chul Choi and Se-Young Oh. “Illumination invariant lane color
recognition by using road color reference & neural networks”. In: The
2010 International Joint Conference on Neural Networks (IJCNN).
IEEE. 2010, pp. 1–5.
137
[6] Shriyash Chougule, Nora Koznek, Asad Ismail, Ganesh Adam, Vikram
Narayan, and Matthias Schulze. “Reliable multilane detection and clas-
sification by utilizing CNN as a regression network”. In: Proceedings of
the European Conference on Computer Vision (ECCV). 2018.
[7] Mohsen Ghafoorian, Cedric Nugteren, Nóra Baka, Olaf Booij, and Michael
Hofmann. “El-gan: Embedding loss driven generative adversarial net-
works for lane detection”. In: Proceedings of the European Conference
on Computer Vision (ECCV). 2018.
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Gen-
erative adversarial nets”. In: Advances in neural information processing
systems. 2014, pp. 2672–2680.
[9] Bei He, Rui Ai, Yang Yan, and Xianpeng Lang. “Accurate and robust
lane detection based on dual-view convolutional neutral network”. In:
2016 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2016, pp. 1041–
1046.
[10] Toan Minh Hoang, Hyung Gil Hong, Husan Vokhidov, and Kang Ry-
oung Park. “Road lane detection by discriminating dashed and solid
road lanes using a visible light camera sensor”. In: Sensors 16.8 (2016),
p. 1313.
[11] Hayoung Kim, Jongwon Park, Kyushik Min, and Kunsoo Huh. “Anomaly
Monitoring Framework in Lane Detection With a Generative Adver-
sarial Network”. In: IEEE Transactions on Intelligent Transportation
Systems (2020).
138
[12] ZuWhan Kim. “Robust lane detection and tracking in challenging sce-
narios”. In: IEEE Transactions on Intelligent Transportation Systems
9.1 (2008), pp. 16–26.
[13] Chanho Lee and Ji-Hyun Moon. “Robust lane detection and tracking
for real-time applications”. In: IEEE Transactions on Intelligent Trans-
portation Systems 19.12 (2018), pp. 4043–4048.
[14] Seokju Lee, Junsik Kim, Jae Shin Yoon, Seunghak Shin, Oleksandr
Bailo, Namil Kim, Tae-Hee Lee, Hyun Seok Hong, Seung-Hoon Han,
and In So Kweon. “Vpgnet: Vanishing point guided network for lane and
road marking detection and recognition”. In: Proceedings of the IEEE
international conference on computer vision. 2017, pp. 1947–1955.
[15] Andre Linarth and Elli Angelopoulou. “On feature templates for par-
ticle filter based lane detection”. In: 2011 14th International IEEE
Conference on Intelligent Transportation Systems (ITSC). IEEE. 2011,
pp. 1721–1726.
[17] Tong Liu, Zhaowei Chen, Yi Yang, Zehao Wu, and Haowei Li. “Lane De-
tection in Low-light Conditions Using an Efficient Data Enhancement:
Light Conditions Style Transfer”. In: arXiv preprint arXiv:2002.01177
(2020).
[18] Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, and Jing-Jhih Lin.
“Efficient dense modules of asymmetric convolution for real-time se-
139
[19] Shao-Yuan Lo, Hsueh-Ming Hang, Sheng-Wei Chan, and Jing-Jhih Lin.
“Multi-Class Lane Semantic Segmentation using Efficient Convolutional
Networks”. In: 2019 IEEE 21st International Workshop on Multimedia
Signal Processing (MMSP). IEEE. 2019, pp. 1–6.
[23] Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and Xiaoou
Tang. “Spatial as deep: Spatial cnn for traffic scene understanding”.
In: Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[24] Mauricio Braga de Paula and Claudio Rosito Jung. “Real-time detec-
tion and classification of road lane markings”. In: 2013 XXVI Confer-
ence on Graphics, Patterns and Images. IEEE. 2013, pp. 83–90.
140
[25] Zamani Md Sani, Hadhrami Abd Ghani, Rosli Besar, Azizul Azizan,
and Hafiza Abas. “Real-Time Video Processing using Contour Num-
bers and Angles for Non-urban Road Marker Classification.” In: In-
ternational Journal of Electrical & Computer Engineering (2088-8708)
8.4 (2018).
[26] Seungwoo Yoo, Hee Seok Lee, Heesoo Myeong, Sungrack Yun, Hyoung-
woo Park, Janghoon Cho, and Duck Hoon Kim. “End-to-End Lane
Marker Detection via Row-wise Classification”. In: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops. 2020, pp. 1006–1007.
141
Chapter 5
The direction of a driver’s visual attention plays a crucial role in the con-
text of Advanced Driver Assistance Systems (ADASs) and semi-autonomous
driving. The way a driver monitors traffic scene objects partially indicates
the level of driver awareness. We propose an analytical method to estimate a
driver’s average traffic scene attention based on the attentional visual field of
the driver in urban and suburban areas. Three metrics are proposed to esti-
mate a driver’s average attention. Our model is capable of identifying driver
attention with respect to traffic objects including vehicles, traffic lights, traffic
signs, and pedestrians within the attentional visual field of the driver at any
moment while in the act of driving.
142
5.1 Introduction
The number of vehicles on the roads increases every day. This fact makes
driving safety and road congestion two significant problems. Preventing fa-
talities and injuries from traffic accidents has become of great importance
for governments and vehicle manufacturers around the world. According to
the World Health Organization (WHO), the number of people killed in road
traffic accidents worldwide was approximately 1.25 million in 2013 and the
statistics show that higher-income countries have fewer road fatalities than
middle-income countries due to better emergency medical facilities, as well
as law enforcement [12]. According to previous studies, driver inattention is
one of the main causes of many accidents. Hence, in recent years, real-time
analysis of a driver’s gaze has attracted the attention of researchers looking
to predict driver behavior [9] in order to increase the safety of driving and
decrease the number of road accidents.
Object detection methods can be divided into two major types: traditional
and deep learning-based algorithms. Among the traditional object detectors,
the approach proposed by Viola and Jones is one that benefits from sliding-
windows and AdaBoost classifiers [22]. Another popular framework in this area
is Support Vector Machine (SVM) classifier with such features as Histograms
of Oriented Gradients (HOG) and Scale Invariant Feature Transforms (SIFT).
For example, in [4], authors employed SVM and a multi-scale searching frame-
work with HOG features to detect pedestrians.
Deep learning-based object detection approaches have attracted researchers’
attention since they have shown promising results in different applications. We
can divide deep learning-based object detection methods into two major cat-
egories: Region-based methods and Regression-based methods. The former
generates region proposals at the first step and then categorizes them into dif-
ferent object classes. Faster R-CNN [16], R-FCN [3] and SPP-net [5] are some
frameworks that follow this strategy. In our laboratory, we have utilized deep
neural networks (Faster R-CNN and ResNet) and classical machine learning
models (multi-scale HOG-SVM) to detect and recognize traffic objects includ-
ing traffic signs, vehicles, traffic lights, and pedestrians [21]. However, none
of this previous work has provided any analytical approaches related to the
traffic objects in the attentional visual field of the driver.
As mentioned, regression-based methods are the second category for object
detection based on deep learning, which view the object detection problem as
144
a regression problem and predict locations of objects directly from the whole
image. The regression-based methods mainly include YOLOv3 [15], DSOD
[18], YOLOv4 [2] as well as YOLOv5 [8]. In this work, we employed YOLOv5
as a traffic object detector along with the attentional visual field of the driver
to analyze average driver attention.
Driver gaze has been studied in real driving environments and driving simu-
lators for many years. Generally, the driver’s gaze is captured by two main
instruments including eye glasses/headband and eye trackers. In this section,
we provide a short summary of several applications which employed the afore-
mentioned instruments to capture driver gaze.
Eye Glasses/Headband
Some researchers worked on the driver gaze based on eye glasses or head-
bands. For instance, Jha et al. [6] presented an approach using headband
based on Gaussian Process Regression (GPR) that predicts the probability of
a given point where the driver is looking at. Deep learning-based models have
also been used for similar purposes. In [7], a deep learning-based method by
means of headband was proposed to predict the driver’s visual attention. By
gradually upsampling the resolution of the gaze region, the authors increased
the accuracy and resolution of the prediction. Palazzi et al. [13] introduced
the dataset called the DR(eye)VE which was created using eye glasses. They
presented a model based on a multi-branch deep network. This model is
composed of three branches of convolutional networks for color, motion, and
scene semantics and their predictions are integrated to create the final map.
145
Eye Tracker
Another group of researchers captured the driver gaze information using eye
trackers. For instance, a CNN-based model was proposed in [23] for driver gaze
estimation in a vehicle environment that combined image information acquired
from the front and side cameras into one three-channel image as an input to
the model to increase recognition reliability and decrease computational cost.
Moreover, in [27], a four-channel gaze estimation model was proposed based
on CNN, which was used to estimate the gaze zones of the driver. The au-
thors achieved considerable accuracy in comparison with several other gaze
estimation methods. In [24], a novel self-calibrated approach with driver’s
gaze pattern learning was proposed to automatically obtain the mapping re-
lationship of driver gaze estimation. The new gaze pattern learning algorithm
was employed to gradually find typical eye gaze calibration points in a natu-
ralistic driving environment. The authors in [17] proposed a new 3-step deep
learning-based method to detect driver head pose class and estimate eye gaze
directions. In the first step, the driver’s face is detected by a YOLO model.
Next, in the second and third steps, CNN-based models were employed to clas-
sify a head pose out of seven driver head poses and estimate the eye directions
respectively. Rangesh et al. [14] presented a method to improve the robustness
and generalization of driver gaze estimation on real-world data recorded under
146
the driver is likely to have seen the object or not, namely, when the existing
object falls inside the attentional visual field of the driver (see Fig. 5.2). In
addition to the objects that are completely located inside the attentional area
of the driver, we also need to consider the situations where one or more traffic
objects is/are located partially inside the attentional area. In such cases, we
consider the object to be partially seen by the driver. Finally, for each object
in the frame, we find the percentage of the area of the object that is inside
the attentional area. The resulting amount for the Percentage of Inside Area
(PIA) for each object can be between 0 and 1. In other words, 0 means the
object is completely outside the visual field of the driver, 1 means the object
is completely inside the visual field and any other number for PIA means the
object is partially located in the visual field; obviously, higher numbers for
PIA mean higher overlaps with the visual field of the driver. Fig. 5.3 shows
the case of an object is partially located in the attentional area while the other
three objects are completely inside the area and our method has obtained 0.28,
1, 1 and 1 for their PIAs respectively.
The overall average attention for a driver can be estimated in different ways.
We consider three different metrics each making use of the attentional data
extracted from an image in the driving sequence of a driver. These metrics
can vary between 0 and 1 and are described in the following.
Metric 1 (M1). As the first metric, separately for each of the aforemen-
tioned object types, we compute the average PIA of the objects of each type
for all frames. Thus we have this metric for each of the individual classes of
objects, i.e., since we have four classes of objects M1 will consist of 4 separate
measurements. M1 is computed for each object type separately as follows:
Metric 2 (M2). As the second metric, we find the average PIA for all
objects ignoring the type of object. In other words, this metric works similar
to M1 but views the four traffic object types (vehicles, traffic lights, traffic
signs, and pedestrians) as one general traffic object type. M2 is computed as
follows:
149
Metric 3 (M3). This metric, similar to M2, views the four traffic object
types as one general traffic object type but unlike M2, determines the average
area percentages of the objects which are partially or completely outside the
attentional visual area of the driver while driving. This metric simply can be
computed as follows:
M3 = 1 − M2 (5.3)
In this section, we provide our vehicle configuration, the data we used for our
experiments, and the result for six different drivers.
Our RoadLAB experimental vehicle is equipped with a non-contact gaze
tracker. This system consists of a pair of infrared stereo cameras mounted on
the dashboard, working at 60Hz. Our instrumented vehicle is equipped with
stereo cameras mounted on the vehicle’s roof to capture the forward driving en-
vironment at a rate of 30Hz. Fig. 5.4 depicts the configuration of the RoadLAB
experimental vehicle. Details concerning this configuration were described by
[1]. The instrumented vehicle was employed to record data sequences from 16
different drivers on a pre-determined 28.5km course around the city of London,
Ontario, Canada. As mentioned, we followed the techniques proposed in our
laboratory to establish the attentional visual field of the driver in the image
plane of the forward stereo vision system. These techniques have been used in
150
several experiments for various purposes in our laboratory [10], [21], [25], [9],
[20], [26].
The analytical results of our experiments for the drivers have been provided
in Table 5.2. In this table, V, TL, TS, and P represent the object types
of vehicle, traffic light, traffic sign, and pedestrian, respectively. In general,
151
5.5 Conclusions
Nowadays, almost every modern vehicle is equipped with some type of ADASs,
ranging from collision avoidance system, alcohol ignition interlock devices,
anti-lock braking system, to parking assistance system. ADASs generally in-
152
crease car and road safety and assist a driver in driving tasks. In this research,
we presented an analytical model to estimate average driver attention based
on the attentional visual field of the driver using different metrics. For this,
we used the RoadLAB dataset obtained from our instrumented vehicle in our
experiments. Next, by establishing the attentional field of view of the driver
we were able to investigate the average area percentages of the traffic objects
including vehicles, traffic lights, traffic signs, and pedestrians, which are in-
side the driver gaze area while driving. By using our approach we are able
to infer the driver’s behavior in terms of the driver’s attentional visual area.
Ultimately, such an augmented approach could enable the driver’s gaze infor-
mation to be integrated into ADAS as a means to determine objects drivers
attend to, and those that they do not, and also to be used to predict driver
maneuver [9] as well as to detect driver distraction as part of ADASs in the
future.
Table 5.2: Analytical results for the attentional visual field of the driver
Driver M1-V (%) M1-TL (%) M1-TS (%) M1-P (%) M2 (%) M3 (%)
3 61.89 67.14 56.20 52.23 61.61 38.39
8 54.92 51.02 54.46 61.27 54.67 45.33
9 56.34 38.94 34.95 39.62 48.95 51.05
12 58.38 64.39 59.53 44.79 58.52 41.48
13 49.47 46.04 38.62 34.18 46.59 53.41
15 56.60 53.94 52.32 47.80 55.07 44.93
153
154
Bibliography
[3] Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. “R-fcn: Object detec-
tion via region-based fully convolutional networks”. In: arXiv preprint
arXiv:1605.06409 (2016).
[4] Jan Dürre, Dario Paradzik, and Holger Blume. “A HOG-based real-time
and multi-scale pedestrian detector demonstration system on FPGA”.
In: Proceedings of the 2018 ACM/SIGDA International Symposium on
Field-Programmable Gate Arrays. 2018, pp. 163–172.
[5] K. He, X. Zhang, S. Ren, and J. Sun. “Spatial pyramid pooling in deep
convolutional networks for visual recognition”. In: IEEE transactions
on pattern analysis and machine intelligence 37.9 (2015), pp. 1904–
1916.
155
[6] Sumit Jha and Carlos Busso. “Probabilistic estimation of the driver’s
gaze from head orientation and position”. In: 2017 IEEE 20th Interna-
tional Conference on Intelligent Transportation Systems (ITSC). IEEE.
2017, pp. 1–6.
[7] Sumit Jha and Carlos Busso. “Probabilistic estimation of the gaze re-
gion of the driver using dense classification”. In: 2018 21st International
Conference on Intelligent Transportation Systems (ITSC). IEEE. 2018,
pp. 697–702.
[8] Glenn Jocher, Alex Stoken, Jirka Borovec, Liu Changyu, and Adam
Hogan. “ultralytics/yolov5: v3. 0”. In: Zenodo (2020).
[11] Kai Lv, Hao Sheng, Zhang Xiong, Wei Li, and Liang Zheng. “Improving
Driver Gaze Prediction with Reinforced Attention”. In: IEEE Transac-
tions on Multimedia (2020).
[12] World Health Organization et al. Global status report on road safety
2015. Tech. rep. World Health Organization, 2015.
actions on pattern analysis and machine intelligence 41.7 (2018), pp. 1720–
1733.
[14] Akshay Rangesh, Bowen Zhang, and Mohan M Trivedi. “Driver gaze
estimation in the real world: Overcoming the eyeglass challenge”. In:
2020 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2020, pp. 1054–
1059.
[16] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. “Faster r-cnn:
Towards real-time object detection with region proposal networks”. In:
arXiv preprint arXiv:1506.01497 (2015).
[17] Sayyed Mudassar Shah, Zhaoyun Sun, Khalid Zaman, Altaf Hussain,
Muhammad Shoaib, and Lili Pei. “A driver gaze estimation method
based on deep learning”. In: Sensors 22.10 (2022), p. 3959.
[18] Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen,
and Xiangyang Xue. “Dsod: Learning deeply supervised object detec-
tors from scratch”. In: Proceedings of the IEEE international conference
on computer vision. 2017, pp. 1919–1927.
[21] Mohsen Shirpour, Nima Khairdoost, Michael Bauer, and Steven Beau-
chemin. “Traffic Object Detection and Recognition Based on the At-
tentional Visual Field of Drivers”. In: IEEE Transactions on Intelligent
Vehicles (2021), pp. 1–1. doi: 10.1109/TIV.2021.3133849.
[22] Paul Viola and Michael J Jones. “Robust real-time face detection”. In:
International journal of computer vision 57.2 (2004), pp. 137–154.
[23] Hyo Sik Yoon, Na Rae Baek, Noi Quang Truong, and Kang Ryoung
Park. “Driver gaze detection based on deep residual networks using the
combined single image of dual near-infrared cameras”. In: IEEE Access
7 (2019), pp. 93448–93461.
[24] Guoliang Yuan, Yafei Wang, Huizhu Yan, and Xianping Fu. “Self-
calibrated driver gaze estimation via gaze pattern learning”. In: Knowledge-
Based Systems 235 (2022), p. 107630.
[25] S.J. Zabihi, S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Detec-
tion and recognition of traffic signs inside the attentional visual field
of drivers”. In: 2017 IEEE Intelligent Vehicles Symposium (IV). IEEE.
2017, pp. 583–588.
[26] S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Real-time driving
manoeuvre prediction using IO-HMM and driver cephalo-ocular be-
haviour”. In: Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE.
2017, pp. 875–880.
[27] Yingji Zhang, Xiaohui Yang, and Zhe Ma. “Driver’s Gaze Zone Estima-
tion Method: A Four-channel Convolutional Neural Network Model”.
In: 2020 2nd International Conference on Big-data Service and Intelli-
gent Computation. 2020, pp. 20–24.
158
Chapter 6
6.1 Introduction
Every year, the large number of car collisions leads to both tremendous human
and economic costs [39]. According to the global status report on road safety
2018, launched by World Health Organization (WHO) [29], approximately 1.35
million fatalities occur per year in the world because of road traffic accidents,
and up to 50 million people are injured. Now, road traffic injury is the leading
cause of death among young people and children aged 5-29 years and makes
road fatalities the eighth leading cause of death across all age groups. More-
over, drivers are less likely to be involved in an accident in the case of the
presence of one or more passengers who can warn them in advance [33]. Ob-
viously, driver error is the main reason for road accidents. To overcome this,
efforts are being made by both academic and industrial groups to develop
Advanced Driver Assistance Systems (ADASs) in different aspects. These sys-
tems attempt to assist the driver’s decision-making in the act of driving or
even take control of the vehicle by performing automatic actions, improving
car, and road safety in general.
and increases the risk of an accident significantly [35]. Moreover, regarding the
effects related to presence of passengers in the vehicle on driver’s performance,
there is a debate among researchers. Some of them concluded a reduction in
driver’s mistakes and violations [33] while some others reported an increase
for those [49], [50].
As mentioned, drowsiness detection is an important research area of driver
behavior analysis since it is one of the major reasons for road accidents. For
instance, according to the National Highway Traffic Safety Administration
(NHTSA) [1], approximately 8,000 deaths occur due to drowsy driving annu-
ally. Methods have been employed for detecting drowsiness of the driver can
be broadly grouped into two categories: methods based on visual features and
methods based on non-visual features [20]. Methods based on visual features
benefit from computer vision techniques for the detection of drowsiness. Visual
feature-based methods attempt to extract facial features such as face, eyes, and
mouth. These methods can be mainly divided into four categories including
eye state analysis [40], eye blinking analysis [19], mouth and yawning analysis
[5], and facial expression analysis [14]. Methods that use non-visual features
can be broadly divided into two categories: driver physiological analysis and
vehicle parameter analysis. The former usually refers to the brain activity and
heart rate of a driver such as electroencephalogram (EEG), heart rate (ECG),
and electrooculogram (EOG) [2], [16], [32], whereas methods based on vehicle
parameters analysis by analyzing vehicle features such as steering wheel move-
ment, lane keeping, the pressure exerted on the brake, and acceleration pedal
movement detect drowsiness of the driver [3], [10].
In the RoadLAB research project, we utilized the FaceLAB eye tracker to
record driver gaze data. In our research group, Kowsari et al. [24] introduced
a cross-calibration technique to transform the aforementioned driver gaze data
163
from the reference frame of the gaze tracker onto the reference frame of a for-
ward imaging system. Moreover, the works which were presented in [22] and
[48] employed the RoadLAB gaze data to model driver behavior and predict
driver maneuvers using driver cephalo-ocular behavioral and vehicular dynam-
ics information. Also in [37] using the gaze data, we detected and recognized
four major types of traffic objects including vehicles, traffic signs, traffic lights
and pedestrians inside and outside the visual filed of the driver. Finally, in
[23], we studied the driver behavior with respect to the aforementioned traffic
objects in terms of the attentional visual field of the driver. In this work,
we also attempt to investigate driver behavior in terms of his/her PoG in the
course of driving with respect to the aforementioned major classes of traffic
objects in percentage of driving time.
PoG has fallen into the object or not. (See Figure 6.2.) To investigate driver
behavior in terms of PoG during driving with respect to traffic objects, we
present three different metrics each making use of the PoG and traffic objects
extracted from an image in the driving sequence of a driver. These metrics
can vary between 0 (i.e. 0% of driving time) and 1 (i.e. 100% of driving time)
and are explained in the following.
Figure 6.1: Two samples of PoG of the driver (the red point)
Metric 1 (M1) As the first metric, separately for each of the aforementioned
object types, we identify the average number of PoGs that have fallen onto the
objects of each type for all frames; As a result, since we have four classes of
objects M1 will consist of four separate measurements for each of the individual
classes of objects. To compute this metric, for each object type separately, we
consider the PoG as a circle with a radius of three pixels. Next, for each frame,
if the PoG has an overlap with an object bigger than a threshold, the PoG
is considered to have fallen onto the object. Otherwise, we conclude that the
165
PoG is outside the objects. For our experiments, we employed the threshold
of five pixels. M1 is computed for each object type separately as follows:
Metric 2 (M2) As the second metric, we obtain the average number of PoGs
that have fallen into any traffic objects ignoring the type of object. In other
words, this metric works similar to M1 but considers the four traffic object
types (vehicles, traffic lights, traffic signs, and pedestrians) as one general
traffic object type. M2 is computed as follows:
Metric 3 (M3) This metric, similar to M2, views the four traffic object
types as one general traffic object type but unlike M2, focuses on the average
number of PoGs of the driver that have fallen outside all the detected traffic
objects while driving. In other words, in this kind of frames we do not know
where the driver is gazing at. This metric simply can be computed as follows:
M3 = 1 − M2 (6.3)
166
In this section, we provide our vehicle configuration, the data we used for our
experiments, and the result for six different drivers.
Our RoadLAB experimental vehicle is equipped with a remote eye-gaze
tracker mounted on the dashboard and also stereo cameras placed on the roof
of the vehicle to record the frontal driving environment. Details related to this
configuration were explained in [4]. Figures 1.2 and 1.3 show the configuration
of the RoadLAB vehicle and the pre-determined path of driving respectively.
To investigate PoG behavior of driver with respect to the aforementioned
traffic objects during driving, we employ our method using a YOLOv5 model
trained on RoadLAB data and measure the aforementioned three different
metrics for each driver. Table 5.1 provides the details on the sequences that
have been gathered by different drivers for our experiments.
The analytical results of our experiments for the drivers have been provided
in Table 6.1. In this table, V, TL, TS, and P stand for the object types
of vehicle, traffic light, traffic sign, and pedestrian, respectively. In general,
various factors such as driving skills, habits, experience and driver distractions
can influence on PoG of the driver during driving. Table 6.1 shows the results
for the estimation of the percentage of the time on average in the path of
driving based on the metrics M1, M2, and M3 which are based on PoG of
the driver. M1 and M2 which focus on the frames in which the PoG has
fallen into the object, higher amounts can indicate the driver has spent more
percentage of his/her driving time gazing at four types of objects in the path
of driving on average. As mentioned, M1 includes M1-V, M1-TL, M1-TS, and
M1-P for four different object types while M2 considers all object types as
one object type for processing. As can be seen, considering the results related
167
6.5 Conclusions
Evidence has shown driver error is the main cause of road accidents. In this
research, we presented an analytical model to estimate the percentage of time
on average in which a driver gazed at different traffic objects using three met-
rics. For this, we used the naturalistic on-road RoadLAB dataset obtained
from our experimental vehicle in our experiments. After obtaining the PoG
of the driver, we estimated the percentage of the experimental driving data
at which PoG fell into different traffic objects including vehicles, traffic lights,
traffic signs, and pedestrians. By using our approach, we can infer the driver’s
behavior in terms of the driver’s PoG in the course of driving. Ultimately,
such methods presented in this work can be useful in designing a future ADAS
system to understand driver intent in advance as well as to measure driver
awareness levels while driving.
Table 6.1: Analytical results for PoG of the driver with respect to the traffic objects
Driver M1-V (%) M1-TL (%) M1-TS (%) M1-P (%) M2 (%) M3 (%)
3 17.12 1.40 0.44 0.84 19.63 80.37
8 25.98 0.27 0.42 2.47 28.90 71.10
9 33.73 0.29 0.23 1.57 35.43 64.57
12 23.44 1.94 0.51 0.31 26.00 74.00
13 23.51 1.32 0.28 0.97 25.60 74.40
15 26.98 0.91 0.30 0.93 28.95 71.05
169
170
Bibliography
[3] Sadegh Arefnezhad, Sajjad Samiee, Arno Eichberger, and Ali Nahvi.
“Driver drowsiness detection based on steering wheel data applying
adaptive neuro-fuzzy feature selection”. In: Sensors 19.4 (2019), p. 943.
[7] Yougang Bian, Jieyun Ding, Manjiang Hu, Qing Xu, Jianqiang Wang,
and Keqiang Li. “An advanced lane-keeping assistance system with
switchable assistance modes”. In: IEEE Transactions on Intelligent
Transportation Systems 21.1 (2019), pp. 385–396.
[8] B. Bilger. Has the self-driving car at last arrived? The New Yorker
(2013). http : / / www . newyorker . com / reporting / 2013 / 11 / 25 /
131125fa_fact_bilger?currentPage=all.
[9] Jeanne Breen. Car telephone use and road safety, an overview prepared
for the European Commission. European Commission, 2009.
[10] Meng Chai et al. “Drowsiness monitoring based on steering wheel sta-
tus”. In: Transportation research part D: transport and environment 66
(2019), pp. 95–103.
[12] Jae Gyeong Choi, Chan Woo Kong, Gyeongho Kim, and Sunghoon Lim.
“Car crash detection using ensemble deep learning and multimodal data
from dashboard cameras”. In: Expert Systems with Applications 183
(2021), p. 115400.
172
[15] G. Griffin, D. Kwiatkowski, and J. Miller. U.S. pat. No. 9248815. Wash-
ington, DC: U.S. Patent and Trademark Office. 2016.
[19] Jaeik Jo, Sung Joo Lee, Kang Ryoung Park, Ig-Jae Kim, and Jai-
hie Kim. “Detecting driver drowsiness using feature-level fusion and
173
[20] Sinan Kaplan, Mehmet Amac Guvensan, Ali Gokhan Yavuz, and Yasin
Karalurt. “Driver behavior analysis for safe driving: A survey”. In:
IEEE Transactions on Intelligent Transportation Systems 16.6 (2015),
pp. 3017–3032.
[25] Vijay Kumar, Shivam Sharma, et al. “Driver drowsiness detection us-
ing modified deep learning architecture”. In: Evolutionary Intelligence
(2022), pp. 1–10.
174
[26] Stéphanie Lefèvre, Ashwin Carvalho, Yiqi Gao, H Eric Tseng, and
Francesco Borrelli. “Driver models for personalised driving assistance”.
In: Vehicle System Dynamics 53.12 (2015), pp. 1705–1720.
[27] Tianchi Liu, Yan Yang, Guang-Bin Huang, Yong Kiang Yeo, and Zhip-
ing Lin. “Driver distraction detection using semi-supervised machine
learning”. In: IEEE transactions on intelligent transportation systems
17.4 (2015), pp. 1108–1120.
[28] Hermes J Mora and Esteban J Pino. “Simplified Prediction Method for
Detecting the Emergency Braking Intention Using EEG and a CNN
Trained with a 2D Matrices Tensor Arrangement”. In: International
Journal of Human–Computer Interaction (2022), pp. 1–14.
[29] World Health Organization et al. Global status report on road safety
2018: Summary. Tech. rep. World Health Organization, 2018.
[30] Sourav Kumar Panwar, Vivek Solanki, Sachin Gandhi, Sankalp Gupta,
and Hitesh Garg. “Vehicle accident detection using IoT and live track-
ing using geo-coordinates”. In: Journal of Physics: Conference Series.
Vol. 1706. 1. IOP Publishing. 2020, p. 012152.
[31] D. Parker, K. Cockings, and M. Cund. U.S. pat. NO. 9682689. Wash-
ington, DC: U.S. Patent and Trademark Office. 2017.
[34] Shahram Sattar, Songnian Li, and Michael Chapman. “Road surface
monitoring using smartphone sensors: A review”. In: Sensors 18.11
(2018), p. 3845.
[37] Mohsen Shirpour, Nima Khairdoost, Michael Bauer, and Steven Beau-
chemin. “Traffic Object Detection and Recognition Based on the At-
tentional Visual Field of Drivers”. In: IEEE Transactions on Intelligent
Vehicles (2021), pp. 1–1. doi: 10.1109/TIV.2021.3133849.
[39] Gito Sugiyanto and Mina Yumei Santi. “Road traffic accident cost us-
ing human capital method (Case study in Purbalingga, Central Java,
Indonesia)”. In: Jurnal Teknologi 79.2 (2017).
176
[40] Chao Sun, Jian Hua Li, Yang Song, and Lai Jin. “Real-time driver
fatigue detection based on eye state recognition”. In: Applied mechanics
and Materials. Vol. 457. Trans Tech Publ. 2014, pp. 944–952.
[41] Farid Talebloo, Emad A Mohammed, and Behrouz Far. “Deep Learn-
ing Approach for Aggressive Driving Behaviour Detection”. In: arXiv
preprint arXiv:2111.04794 (2021).
[43] Qun Wang, Weichao Zhuang, Liangmo Wang, and Fei Ju. Lane keeping
assist for an autonomous vehicle based on deep reinforcement learning.
Tech. rep. SAE Technical Paper, 2020.
[44] Cheng Wei, Fei Hui, and Asad J Khattak. “Driver lane-changing behav-
ior prediction based on deep learning”. In: Journal of advanced trans-
portation 2021 (2021).
[45] Samuel Würtz and Ulrich Göhner. “Driving Style Analysis Using Re-
current Neural Networks with LSTM Cells”. In: Journal of Advances
in Information Technology Vol 11.1 (2020).
[46] Guoliang Yuan, Yafei Wang, Huizhu Yan, and Xianping Fu. “Self-
calibrated driver gaze estimation via gaze pattern learning”. In: Knowledge-
Based Systems 235 (2022), p. 107630.
[47] S.J. Zabihi, S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Detec-
tion and recognition of traffic signs inside the attentional visual field
177
[48] S.M. Zabihi, S.S. Beauchemin, and M.A. Bauer. “Real-time driving
manoeuvre prediction using IO-HMM and driver cephalo-ocular be-
haviour”. In: Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE.
2017, pp. 875–880.
[50] Lanfang Zhang, Boyu Cui, Minhao Yang, Feng Guo, and Junhua Wang.
“Effect of using mobile phones on driver’s control behavior based on
naturalistic driving data”. In: International journal of environmental
research and public health 16.8 (2019), p. 1464.
[51] Yingji Zhang, Xiaohui Yang, and Zhe Ma. “Driver’s Gaze Zone Estima-
tion Method: A Four-channel Convolutional Neural Network Model”.
In: 2020 2nd International Conference on Big-data Service and Intelli-
gent Computation. 2020, pp. 20–24.
[52] Kawtar Zinebi, Nissrine Souissi, and Kawtar Tikito. “Driver Behav-
ior Analysis Methods: Applications oriented study”. In: Proceedings of
the 3rd International Conference on Big Data, Cloud and Application
(BDCA 2018). 2018.
178
Chapter 7
Evidence has shown that drivers play a crucial role in most driving events,
and a significant number of vehicle accidents are due to driver error. Hence,
researchers and vehicle manufacturers are making efforts to analyze and model
driver behavior with different views in different driving situations as well as
to predict the most probable next maneuver and assist the driver in avoiding
unsafe maneuvers. In Chapter 2, we developed a deep learning-based model
to predict five types of driver maneuvers. For this, our model benefited from
driver cephalo-ocular behavioral and vehicular dynamics information to do its
task. Our experimental results in this work showed that our LSTM-based
model outperformed the traditional IO-HMM-based model. In order to pre-
vent potential accidents, this such a system can offer a possible solution for
allowing ADAS to alert the driver at an early stage before making a mistake
and performing a dangerous maneuver.
detects and recognizes four important classes of traffic objects including vehi-
cles, pedestrians, traffic signs, and traffic lights inside and outside the atten-
tional visual area of the driver. The object detection stage was constructed by
a combination of both traditional and deep learning-based models. Finally, the
recognition stage was implemented using ResNet101 models. Nowadays, ob-
ject detection is widely employed in designing ADASs for not only autonomous
driving but also ordinary vehicles. For example, the detection of vehicles can
avoid accidents and keep a safe distance from surrounding vehicles. Pedestrian
detection is significant in reducing fatalities and injuries. Recognition of traffic
signs and lights helps vehicles to comply with traffic rules.
In Chapter 4, we presented a CNN-based model to detect and verify lane
types in urban and suburban driving environments. We classified various types
of lanes as they provide contextual information and indicate traffic rules rel-
evant to driving. Following the detection stage, we used a two-step method
to classify the lane boundaries into eight classes, considering road boundaries
as one particular type of lane. These mechanisms can help us in designing
ADAS applications such as lane keeping assistance, lane departure warning,
overtaking assistance as well as intelligent cruise control.
It is generally accepted that a driver cannot attend to the whole traffic
environment because of his/her limited gaze area. Moreover, a driver may
miss some critical information because of inappropriate driving habits, driving
skills, or distractions that affect the choice of proper driving maneuver. In
Chapter 5, we developed an analytical vision-based model to estimate average
driver attention based on the attentional visual field of the driver by employing
several metrics. For this purpose, we also trained a YOLOv5 object detector
model on RoadLAB data to identify traffic objects. By utilizing our approach
and considering consecutive small periods of time while driving, it is possible to
180
design an ADAS based on the driver’s attentional visual area to infer whether
the driver is paying enough attention to the traffic objects or whether he/she
has been distracted.
In Chapter 6, we presented an approach to measure an average percentage
of the time that a driver has gazed at different traffic objects in the course of
driving. To reach this purpose, we benefited from a YOLOv5 object detector
trained on RoadLAB data, PoG of the driver as well as our proposed metrics.
This approach helps us to understand the driver’s behavior in terms of the
driver’s PoG during driving.
Our contributions to the creation of next generation ADAS are summarized
as follows:
3. Collecting and annotating a large dataset for different traffic objects and
road lanes.
in the future.
1. Objects detected inside the attentional visual field of driver can be em-
ployed to analyze driver attention in consecutive small periods of time
while driving instead of considering an entire sequence as one time unit.
For this, it is possible to define a sliding time period and compute driver
attention based on the visual field and investigate in what locations and
in what driving situations a driver strengthens his/her attention or is
distracted. Moreover, there is also an interest to monitor and analyze
the driver’s behavior using a dashboard camera observing driver’s ac-
tivities during driving to automatically detect driver distraction. More
specifically, these ADAS systems by means of analyzing the face and
hands of the driver could detect driver distraction and identify the cause
of distractions such as cell phone talking, texting, operating the radio,
eating, etc. As a result, for detecting driver distraction a future ADAS
which incorporates the two aforementioned methodologies to take ad-
vantage of both would be more practical and promising in real driving
environments.
2. To make the object detector model more comprehensive, bike and mo-
torcycle objects can be added to the dataset as well. As a result, it can
be possible to identify more other objects drivers encounter and attend
to while driving.
3. Employing a digital street map along with the vehicle’s GPS coordinates
can provide an intelligent ADAS with more contextual information. In
other words, augmenting the vehicle’s GPS coordinates with the street
map could enable the ADAS to detect upcoming road artifacts such as
183
4. By employing the video from the forward imaging system and identifying
the side lanes in addition to the ego lane, a future ADAS could determine
whether a lane exists on the right side and also on the left side of the
vehicle. This contextual information would provide additional informa-
tion for the driver maneuver prediction system. For instance, when the
vehicle is moving in the left-most lane, the only safe maneuvers are going
straight and right lane change, unless it is approaching an intersection.
tion.
185
186
Publications
[8] Nima Khairdoost, Mohammad Reza Baghaei Pour, Seyed Ali Moham-
madi, and Mohammad Hoseinpour Jajarm. “A ROBUST GA/KNN
BASED HYPOTHESIS VERIFICATION SYSTEM FOR VEHICLE
DETECTION”. In: International Journal of Artificial Intelligence &
Applications 6.2 (2015), p. 21.
[9] Hoda Khoshnevis and Nima Khairdoost. “Using Simulation Applica-
tions for Sustainable Design and Construction”. In: ISARC. Proceed-
ings of the International Symposium on Automation and Robotics in
Construction. Vol. 33. IAARC Publications. 2016, p. 1.
[10] Hadi Sarvari, Nima Khairdoost, and Abdolvahhab Fetanat. “Harmony
search algorithm for simultaneous clustering and feature selection”. In:
2010 International Conference of Soft Computing and Pattern Recog-
nition. IEEE. 2010, pp. 202–207.
[11] Mohsen Shirpour, Nima Khairdoost, Michael Bauer, and Steven Beau-
chemin. “Traffic Object Detection and Recognition Based on the At-
tentional Visual Field of Drivers”. In: IEEE Transactions on Intelligent
Vehicles (2021), pp. 1–1. doi: 10.1109/TIV.2021.3133849.
[12] Alireza Tofighi, Nima Khairdoost, S Amirhassan Monadjemi, and Ka-
mal Jamshidi. “A robust face recognition system in image and video”.
In: International Journal of Image, Graphics and Signal Processing 6.8
(2014), p. 1.