0% found this document useful (0 votes)
83 views19 pages

Active Vision-Based Attention Monitoring System Fo

This article proposes a vision-based system to actively monitor a driver's attention state in real-time using a camera. The system aims to classify a driver's attention level and state based on visual cues like eye movements, gaze direction, facial expressions, and body movements. It was tested on human participants with 92% accuracy and could help reduce distracted driving accidents in an affordable way for drivers of different income levels and countries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views19 pages

Active Vision-Based Attention Monitoring System Fo

This article proposes a vision-based system to actively monitor a driver's attention state in real-time using a camera. The system aims to classify a driver's attention level and state based on visual cues like eye movements, gaze direction, facial expressions, and body movements. It was tested on human participants with 92% accuracy and could help reduce distracted driving accidents in an affordable way for drivers of different income levels and countries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Active Vision-based Attention


Monitoring System for Non-Distracted
Driving
LAMIA ALAM 1 , (Member, IEEE), MOHAMMED MOSHIUL HOQUE 1 , (SENIOR MEMBER,
IEEE), M. ALI AKBER DEWAN 2 , (MEMBER, IEEE), NAZMUL SIDDIQUE 3 , (SENIOR
MEMBER, IEEE), INAKI RANO4 AND IQBAL H. SARKER 1 , (MEMBER, IEEE)
1
Department of Computer Science & Engineering (CSE), Chittagong University of Engineering & Technology (CUET), Chattogram-4349, Bangladesh (e-mail:
[email protected]; [email protected]; [email protected])
2
School of Computing and Information Systems, Faculty of Science and Technology, Athabasca University, Alberta, Canada (e-mail: [email protected])
3
School of Computing, Engineering and Intelligent Systems, Ulster University, UK (e-mail: [email protected])
4
The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Denmark (e-mail: [email protected])
Corresponding author: Mohammed Moshiul Hoque (e-mail: [email protected]).

ABSTRACT Inattentive driving is a key reason of road mishaps causing more deaths than speeding or
drunk driving. Research efforts have been made to monitor drivers’ attentional states and provide support
to drivers. Both invasive and non-invasive methods have been applied to track driver’s attentional states, but
most of these methods either use exclusive equipment which are costly or use sensors that cause discomfort.
In this paper, a vision-based scheme is proposed for monitoring the attentional states of the drivers. The
system comprises four major modules such as cue extraction and parameter estimation, monitoring and
decision making, level of attention estimation, and alert system. The system estimates the attentional level
and classifies the attentional states based on the percentage of eyelid closure over time (PERCLOS), the
frequency of yawning and gaze direction. Various experiments were conducted with human participants to
assess the performance of the suggested scheme, which demonstrates the system’s effectiveness with 92%
accuracy.

INDEX TERMS Computer vision, attentional states, attention monitoring, human-computer interaction,
driving assistance, gaze direction

I. INTRODUCTION crashes [4]. Monitoring drivers’ attention with smart driving


Global surveys suggest that drivers’ lack of attention is the assistance system would reduce the risk of road crashes and
major cause of road accidents [1]. Every year more than may help to improve the driving efficiency.
1.35 million people die and tens of millions more are injured There are various ways to track driver’s attentional states
or disabled due to road accidents. Moreover, death rate is through using either invasive or non-invasive methods. In
three times higher in countries with low income than in high- invasive methods, sensors are often used to analyze driver’s
income countries. It is the responsibility of the driver to physiological states and driving performance. Physiological
keep the attentional state high during driving for the safety signal includes heart rate (electrocardiogram or ECG/EKG
of the passengers and the driver him/herself. Attentional signal), brain activity (electroencephalography or EEG sig-
state represents the physical, physiological, and behavioral nal), muscle current (electromyography or EMG signal), res-
parameters of the driver [2]. Attentional state can be low due piratory rate variability (RRV), eye’s cornea-retinal standing
to various distracted activities which include using cell phone potential (Electrooculography or EOG), and skin conduc-
for texting or talking, eye glance away from the road due to tance (electrodermal activity signal). These signals are often
rambling of mind, or sleepiness results from a lack of rest and collected through electrodes connected to human body. Also,
prolonged mental activity or long period of stress or anxiety various in-vehicle sensory or external devices such as ac-
[3]. It has been found that factors such as fatigue, drowsiness, celerometer, gyroscope, and magnetometer are used to assess
and distraction impede the drivers’ ability to pay attention on driving performance by acquiring data from steering wheel
the road and surroundings, which result in most road traffic angle, brake or acceleration statue, lane position changing

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

pattern and so on. Most of these sensors require direct contact between them are in terms of function and data. Our pre-
with skin which is intrusive for the users and often provides vious framework is developed for a real-time attentional
distorted information [5]. state detection only and assessed with a limited number of
On the other hand, the non-invasive methods do not re- tests. This framework is developed for vision-based attention
quire any contact with body, and the imaging sensor can monitoring of drivers in real-time. This work demonstrated
collect attentional information of the subject from distance. how the proposed framework classifies the drivers’ atten-
Vision-based attentional information includes facial features tional state and measures the level of attention by extracting
and body movements. Most used non-invasive attentional visual cues. Moreover, the proposed system evaluated in the
cues in currently available vision-based systems are eye- real driving scenario under diverse situations (i.e., drivers
lid movements (e.g.,eye blink frequency, closure duration) having different facial features such as beard, moustache, and
[6]–[11], eye gaze [6]–[8], [12], head movement [7]–[9], hairstyles or wearing accessories) proves its efficiency and
[13], [14], facial expressions (e.g., yawning, lip movements accuracy.
etc.) [7], [11], [13], [15], and body movements (like hand A number of approaches have been taken to predict driver
movement) [15]–[17]. However, these existing vision-based attention state for providing traffic safety and to reduce the
systems have some limitations: (i) capturing sensors used by number of accidents. However, most of the available systems
the aforementioned systems are either expensive camera(s) for driver’s attention monitoring purpose usually are either
[7]–[10], [14], [15], [17] with any additional sensor/hardware expensive or limited to special high-end car models. These
[11], [12], [16], [17] or some specialized imaging sensor systems cannot be affordable for drivers of low income or
(e.g., eye tracker [6] and Kinect [13]); (ii) some of the developing countries. Thus, an attention monitoring system
systems used only single parameters such as pupil [12], PER- should develop smart driving assistant which maintain a good
CLOS [10] and head pose [14] to estimate driver’s attentional balance between affordability and functionality. An effective
state making the system unable to adapt to some situations and used friendly system can save people’s lives. The major
which are common in the real driving scenario (for example, contributions of this work are:
turning head or wearing sun glass can hide eye) and resulting • Propose a vision-based framework that can constantly
in incorrect attentional state detection; (iii) some existing track the attentional states and level of attention of the
systems detect single inattentional state [8]–[14], [17] or driver.
limited only to the level of the same state [6], whereas some • Develop an awareness system by generating alarm for
other focus on detecting driver’s activity [15], [16]; (iv) some the driver if inattentional state is detected.
systems do not have any alert system to warn the driver of any • Evaluate the performance of the proposed framework in
inattentional state when detected during driving [6]–[8], [10], real driving scenarios.
[11], [14]–[16]; (v) some of the previous works [11], [15], The rest of the paper is organized as follows: Section II
[17] were evaluated in a simulated environment and may presents the related work. Section III provides a brief descrip-
not work accurately in the real driving scenario; and (vi) no tion of the proposed attention monitoring system. Section IV
evidence was provided about the systems mentioned above presents a number of experiments with corresponding results
working in diverse situations (e.g. drivers were having differ- and discusses the future works. Finally, in Section V, the
ent facial features such as beard, moustache, and hairstyles or paper is concluded with a brief summary.
wearing accessories (e.g. spectacle, sunglass, and cap) which
are common in real driving scenarios. II. RELATED WORK
In this paper, we intend to overcome the above limita- Attention is an important activity of the brain that decreases
tions. We propose a vision-based system that extracts driver’s the information flow into brain’s sensory system. It enhances
attentional cues/features to estimate attentional state and the relevant or vital parts of the input stream and discards
classifies it into attentive, drowsy, fatigued and distracted. disruptions [22]. Zivony et al. [23] has investigated the spatial
The system also alerts the driver in any of inattentional states attention which endorses the high-level processing and also
such as drowsy, fatigue or distraction. As we found that identifies some boundary conditions of attentional engage-
fatigue, drowsiness, and visual distraction are major causes ment. Their findings suggested that eye blink interrupted at-
of inattentiveness which are usually encountered during un- tentional engagement, whereas attentional capture (shifting)
safe driving and there is strong correlation between fatigue, was unaffected. Benedetto et al. [24] also suggested blink
drowsiness, visual distraction and drivers’ facial cues [18]. duration (BD) and blink rate (BR) as a more profound and
Among those as mentioned earlier non-invasive attentional trustworthy indicator of driver’s visual workload.
cues we also found that, the percentage of eyelid closure over In the past decade, detection of attention of driver’s has
time (PERCLOS) [19], frequency of yawning [20] and gaze become an active research field. A broad review of dif-
direction [21] to be most useful indicators for monitoring ferent approaches for attention detection has been reported
drivers’ attention. Thus, in this work, we have used these in [5]. These approaches are grouped into five categories
parameters to estimate the driver’s attentional state. The such as subjective, physiological, vehicle-based, visual be-
proposed framework is similar in its concept to our previous havioral, and hybrid. Subjective approaches involve detection
framework developed in [7]. However, the main differences of driver’s inattention through questionnaire, and feedback
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

are collected as rating scores [25]. However, the subjective proposed a framework to estimate driver’s attention in terms
approaches are not effective in detecting driver’s inattention of facial angle and lip motion. Vicente et al. [14] reported a
in a real time, rather these are effective to cross validate system based on head pose and gaze estimation that detects
the accuracy of other approaches. Physiological approaches Eyes off the Road (EoR). Most of these researches investi-
depend on some vital information such as heart rate, brain gated either distraction or sleepiness using only one or two
activity, skin conductance, etc. These approaches typically visual parameters for detection of drivers’ attention. To solve
detect hyper vigilance in a simulated human-machine system the inconvenience caused by physiological approaches in
based on physiological signals such as RRV, ECG, EEG, [11] a vision-based physiological signal measurement system
and EOG [26]–[30]. These systems either use wires and was proposed to estimate driver fatigue. The system uses only
electrodes running from the driver to the system causing one camera to collect physiological information (e.g. remote
distraction and annoyance to the drivers [29] or expen- photoplethysmography (rPPG) signal) to estimate heart rate
sive wearable respiratory inductive plethysmography (RIP) (HR), pulse rate variability (PRV), and facial features to
band [26], wireless headsets [27], [28] and EEG acquisition estimate the percentage of eyelid closure (PERCLOS) and
equipment [30]. Vehicle-based approaches involve evaluating yawning state to measure the fatigue state of the driver.
driving behaviour such as steering wheel movements, change The system was developed and tested in a controlled indoor
acceleration/ speed, and lane position changing and braking simulated driving environment with sufficient light to avoid
patterns over time for detecting inattentiveness [31], [32]. the interference from the external environment and ambient
Detecting inattentiveness based on driving behavior is not light with a high-resolution camera. Recommended condition
fairly reliable because the level of errors may vary from for rPPG estimation requires a good lighting condition with
person to person. Visual behavior-based approach involves high resolution and uncompressed camera. Therefore, when
extraction of visual features of the driver and have been used the lighting condition is flawed in a real driving scenario,
widely and effectively to detect driver’s inattentiveness [33]. rPPG based system may not function properly.
For example, Ramírez et al. [34] and Takemura et al. [35] Currently available systems are either expensive and lim-
proposed a method that uses a head mounted sensors. A few ited to special high-end car models or affordable solutions
researches have been conducted recently for driver’s attention that lack accuracy and robustness. That is what motivated
detection by integrating various approaches, for example, us to focus on implementing a driver’s attention monitoring
combining driver’s physiological signals and visual signal system to bridge the gap between affordability and avail-
[36], driver’s physiological signals and driving contexts [37] ability with functionality. In this research, we focused on
or driver’s visual cues and driving patterns [38]. developing a vision-based system that extracts driver’s atten-
Both researchers and drivers found non-intrusive vision- tional cues/features and classify them into attentive, drowsy,
based systems appealing for attentional state detection and fatigue and distracted. The system also alerts the driver in the
monitoring. Recently, Gumaei et al. [16] developed a use- event of any of inattentional state such as drowsy, fatigue or
ful camera-based framework for real-time identification of distraction.
drivers’ distraction by using two different deep learning mod-
els, a custom deep convolutional neural network (CDCNN) III. PROPOSED ATTENTION MONITORING FRAMEWORK
model and a visual geometry group-16 (VGG16)-based fine- The primary goal of this research is to develop a system that
tuned model and classified drivers’ behaviour in 10 cate- determines the attentional states of the driver during driving.
gories. An Interwoven Convolutional Neural Network (Inter- In this work, we considered cues related to eyes, mouth and
CNN) was proposed by Zhang et al. [15] to classify driver head region. Fig. 1 demonstrates the schematic illustration of
behaviors in real-time. This system can classify 9 different the proposed attention monitoring framework.
behaviors which are the most frequently encountered cases of Drivers’ attention monitoring starts with capturing video
distracted behavior during driving. In [6], a fatigue detection input of driver’s frontal face for visual cues using a gen-
system was proposed that classifies the behavior into three eral purpose webcam (Logitech C170) placed at a distance
category such as normal, warning and danger based on eye (0.6m − 0.9m) from the driver’s face on the vehicle fascia
gaze in real time. Tran et al. [17] proposed a rel-time driver (as shown in Fig. 2). The captured video sequence is sent to
distraction detection system that is able to identify 10 types the next module for further processing.
of distractions through multiple cameras and microphone and Isolating driver’s face region throughout the monitoring
also alerts through a voice message. Alam et al. [7], [8] process is the first important step. So, video sequence is
proposed a system that estimates attentional states of driver divided into frames, each frame is converted into gray scale
based on various visual cues. Shibli et al. [9] estimated level and then the face detection and tracking are performed.
of attention and detected fatigue during driving based on as- The Viola-Jones face detection algorithm [39] is employed
sessing eye aspect ratio (EAR) and head pose. Chien and Lee to detect faces. For each face in the frame, the algorithm
[12] proposed a system to detect situations when the driver’s returns the positions of the detected face as a rectangle. When
eyes exhibit distraction for a long duration and generates an more than one face is detected, the largest rectangle (closest
alarm. Mandal et al. [10] employed a classifier to identify face) is determined using the integral image and is marked
drivers’ state based on PERCLOS. Chowdhury et al. [13] as driver’s face. As we need to extract cues from the face
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

A. CUES EXTRACTION AND PARAMETER ESTIMATION


PERCLOS, yawn frequency and gaze direction are visual
cue-based parameters which are used to estimate driver’s
attentional state. These parameters need extraction of at-
tentional cues from the face region such as eyes, mouth
and head region. Cues indicating drowsiness, fatigue and
distraction appear mostly in the eye regions. Thus, cues
extracted from eye regions are used to estimate the param-
eters PERCLOS and eye gaze. Another key cue of fatigue is
excessive yawning which is extracted from mouth region to
estimate yawn frequency. Usually, normal driving head/face
orientation is frontal. Therefore, the deviation of head/face
orientation from the standard direction for a substantial time
is classified as distraction. The cues related to head posi-
tion is used to estimate gaze direction. However, the face
detector module returns only face region based on which
FIGURE 1: Schematic representation of driver’s attention we can infer an approximate orientation. A facial landmark
monitoring framework. detector proposed by Kazemi et al. [41] is used to extract the
additional information, and trained on iBUG300-W data set
[42]. Visualization of the 68 facial landmarks from the iBUG
300-W dataset is given in Fig. 4.

FIGURE 2: Webcam on vehicle fascia for capturing facial


cues.

region to characterize attentional status, analysis of the face


region is needed to be done for each frame. But detecting FIGURE 4: Visualization of the 68 facial landmarks.
the face for every frame is computationally expensive, so we
implemented face tracking algorithm proposed by Danelljan For each frame, the face region is found from the face de-
et al. [40] once the face is detected. For each frame, a corre- tection and tracking module, facial landmark detector is used
lation tracker is used to keep record of the tracking quality. to locate salient regions of interest (ROI) of the face to extract
Depending on tracking quality, a rectangle image around the cues and the cues are partitioned into three broad classes
face is drawn to indicate that the tracker is following the depending on the parameters PERCLOS, yawn frequency,
face. Face detection and tracking starts again if the face is and gaze direction to estimate driver’s attentional states.
lost or tracking quality falls. Fig. 3 shows the output from
this module. The detected face region is then sent to cues 1) Estimation of PERCLOS
extraction and parameter estimation modules. It is defined as the percentage of duration of eye being closed
over time interval T1 (60 seconds), excluding the eye blinks
and is defined by (1).
t
P ERCLOS = × 100%, (1)
T1
where t is the duration of closed-eye state.
Drowsiness and fatigue can also be detected from
(a) (b) P ERCLOS and the period of closed-eyes state. Computa-
tion of duration of closed-eye state needs to be determined
FIGURE 3: Output of the system, when (a) detecting and (b)
tracking the face. from the eye state (open or closed). The eye state is also
used to detect eye blinks and to measure P ERCLOS in
overlapped time window of one minute continuously. For
each frame (f ), the eye state is estimated by the measure
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

of eye aspect ratio (EAR). The EAR is an estimate of the


eye opening state [43]. The EAR demonstrates a constant
value in the case of open-eye state, but quickly changes to 0
when the eye is in closed state. EAR shows the correlation
between width and height of the eye in terms of proportion. A
threshold value has been set based on the definition of closed-
eye state, i.e., the eyes are 80% or more occluded. Fig. 5
shows the six points of right eye used from Fig. 4 to calculate
EAR extracted by the landmark detector. After the detection
of six (x, y)-coordinates of the right and left eye, points are
individually passed to estimate EAR. EAR for each eye i is
FIGURE 6: Mouth Region (MR) for yawn detection.
calculated using (2).
|P2 − P6 | + |P3 − P5 |
EARi = , (2) are measured using the algorithm suggested by Suzuki and
2|P1 − P4 |
Abe [45]. The yawn is presumed to a wide mouth aperture
where P1 , . . . , P6 are 2-D points rendered in Fig. 5.
vertically. Thus, the area of the mouth contour tends to extend
in successive frames while yawning. Fig. 7 illustrates the
detected M R by the system.

FIGURE 5: Point related to right eye region localized by facial


landmark detector.

The average EAR of both eyes is calculated using (3). (a) (b)
EARRight + EARLef t FIGURE 7: Mouth region detected by the proposed system
EAR = . (3)
2 when mouth is (a) closed and (b) open.

2) Estimation of Yawn Frequency Ratio (RM ) of the width (WM ) to the height (HM ) of M R
Excessive yawning is associated with the fatigue. Yawning describes the rate of increase and it used as an indication of
frequency (Y F ) for a time widow of 60 seconds can be yawning. Therefore, RM can be defined by (7).
estimated by detecting the yawning and incriminating the
HM
corresponding counter (Y N ) using the equation defined in RM = . (7)
WM
(4).
Y F = Y NT2 . (4) RM is low (i.e., (≤ T hY )) when mouth is closed and vice-
verse. Here, threshold value T hY is calculated empirically.
where T2 is the time window. Estimation of yawn frequency Driver is considered to be yawning if a significant number of
need to isolate the mouth region (MR) from the rest of the successive frames (e.g. for 3s) of mouth state is found open.
image (see Fig. 6) and is done using equations (5)-(6). Y N denotes the number of yawning which is initially 0 and
x + w4 is incremented whenever yawn is detected.
   
x1
= , (5)
y1 y + 11h
16
3) Estimation of Gaze Direction
x + 3w
   
x2 4 Gaze direction (GD) is used to detect distraction state of
= , (6)
y2 y+h the driver. Both face direction (F D) and eye gaze direction
where (x, y) is initial point of detected face; w and h are the (EGD) are taken into account to estimate GD. Usually,
width and height. standard GD of a driver should be frontal. Deviation from the
After isolating M R, the system detects the width of the typical position for long period of time (T3 ) is an indication
opening of the mouth due to yawning. The area of the of distraction. GD can be calculated by (8).
mouth is determined by performing gray scale conversion,
GD = {F D, EGD}T3 . (8)
histogram equalization and finally an uneven segmentation
(SM R ) of the unlighted region of M R (i.e., inner part) is Eye centers are detected at first to estimate the GD. To
computed using the threshold (τ ), which is set using Otsu’s simplify the eye center detection, some prepossessing are
method [44]. The mouth area and the contour of the mouth performed on both eyes individually. First, the region of
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

eyes isolated utilizing the six (x, y)-coordinates (see Fig. 5). is determined using the Rodrigues rotation in (11).
Then, by using bi-linear interpolation method, eye regions  
0 −rz ry
are resized. To improve the contrast in the image, histogram
R = cos θI + (1 − cos θ)rrT + sinθ  rz 0 −rx  ,
equalization is performed and skin pixels are eliminated by
−ry rx 0
setting a threshold depending on the highest count and the (11)
dimension of the equalized image. Finally, erosion followed where I is a vector in R3 and θ = krk2 .
by dilation is performed for noise removal. Fig. 8 shows the
detection process of the right eye. After the prepossessing

(a)

FIGURE 8: Eye center detection processes.

of the eye image, the visible eyeball area is considered as


an ellipse. To identify the outermost border of this ellipse
border following an algorithm by Suzuki and Abe [45] is used
and then Douglas-Peucker approximation algorithm [46] is
utilized to reduce the number of points in the curve. Then,
the center (x̄, ȳ) of the ellipse is estimated using moments
[47] and is used to indicate the center of the eye. Once the
centers of both eyes are calculated, EGD is categorized into
right (i.e. θ > 8°), left (i.e. −8° < θ) and front (otherwise) (b)
in terms of a threshold value, θ. Here, θ = tan−1 ( 4x 4y ) and FIGURE 9: (a) Points used to calculate the head pose which
4x = x¯R − x¯L , 4y = y¯R − y¯L , where, (x¯L , y¯L ) and left and has marked with red dots and (b) points are detected and
right eye center corresponds to (x¯R , y¯R ).Blue lines which estimated GD by the system when EGD and FD are front.
started from center of eye and connected together on the other
end represents estimated EGD in Fig 9b. The Euler angle yaw (α) is obtained form the vector, r to
Head orientation plays an important role to indicate estimate the face direction(F D) to the left (−90° ≤ α <
driver’s attention state. It is connected with the eye gaze −30°), right (30° < α ≤ 90°) and front (otherwise). Our
direction to determine the person’s field of view. Head ori- system appears to be operative for a α range of [−90°, +90°]
entation is estimated by detecting the 15 points marked as centered at frontal and F D detection rate drops significantly
red dots in Fig. 9a. At first, the system detected the 15 when α exceeds the range. Fig. 9b shows an example of gaze
points (Fig. 9b) and then the standard solution of Perspective- estimation of the proposed system.
n-Point (PnP) problem [48] is used to determine the head
orientation which can be represented as (9). Table 1 summarizes the values of notations used to esti-
mate the parameters along with the observation time.
h = (r, t)T , (9)
where h is 3-D head orientation comprises of rotations, B. DRIVER’S STATE OF ATTENTION ESTIMATION
r = (rx , ry , rz )T , and translations, t = (tx , ty , tz )T . The 3- Estimated parameters P ERCLOS, Y F and GD are used to
D axis is represented as red, green, and blue color in Fig. 9b. estimate the driver’s state of attention (SoA). P ERCLOS is
The perspective transformation is performed by (10). considered to be the most effective visual feature-based mea-
surement of drivers’ attentional state in terms of sleepiness
s[p, 1]T = A[R|t]P T , (10)
detection. Table 2 presents the classification of driver’s atten-
where s is a scaling factor, A is a camera matrix and [R|t] is tional states which were experimentally found by Jimenez et
joint rotation-translation matrix. Rotations, r = (rx , ry , rz ) al. [49] on a time window in one minute.
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

TABLE 1: Parameters used to driver’s attentional state moni-


toring system with notation and observation time. tomatically when the driver gets back to the desired SoA (i.e.
attentive). Three different types of beep sounds are used for
Parameter Notation Details Time (Sec) each drowsy, fatigue, and distracted SoA. Fig. 10a illustrates
Duration of an warning message, "DROWSINESS ALERT" is triggered
PERCLOS t T1 = 60
closed eye state and displayed when the system estimated that the driver is
Yawn Frequency T hY 0.04 T2 = 60 in drowsy state according to Algorithm 1. Similarly warning
Right message for fatigue and distraction detection is displayed
Gaze FD
Left T3 = 2 (Fig. 10b and Fig. 10c).
Direction EGD
Front

TABLE 2: Classification of driver’s attentional state in terms


of PERCLOS values in 60 seconds windows.
State P ERCLOSM IN P ERCLOSM AX
Fully awake 0.000 0.048
Fatigue (semi- 0.048 0.125
drowsy)
(a) (b) (c)
Drowsy 0.125 1.000
FIGURE 10: Messages and triggering sound: (a) Drowsy
SoA, (b) Fatigue SoA, and (c) Distraction SoA.

In this work, we characterized the fatigue of the driver’s


using the parameter Y F . High Y F (1-4 yawns per minute) is D. DRIVER’S LEVEL OF ATTENTION ESTIMATION
the indicator of fatigue [20]. Estimated GD is used to classify Parameter values of P ERCLOSf , Y Ff and GDf for each
driver’s attentional states to distracted state. Normally GD of frame, f computed from the previous module cue extraction
driver should be frontal but if GD is in the other direction and parameter estimation. These values are scaled down to a
over a time period, it is presumed to be distraction. However, range [0,1] for the measured time series using equations (12)-
while driving if the driver needs to do some in-vehicle and (14). The maximum and minimum values of each parameter
outside vehicle viewing tasks, which requires him/her to is stored and updated from the beginning of monitoring.
change his gaze direction. So keeping those visually demand-
P ERCLOSf −min(P ERCLOS)
ing task in mind, we used the GD estimated in 2 seconds P ERCLOSScale_f = max(P ERCLOS)−min(P ERCLOS) ,
window according to ISO standard (or more details, see [50]). (12)
Based on the above discussion, a procedure to estimate SoA Y Ff − min(Y F )
Y FScale_f = , (13)
is presented in Algorithm 1. max(Y F ) − min(Y F )
GDf − min(GD)
Algorithm 1 Estimate State of Attention (SoA) for frame, f GDScale_f = . (14)
max(GD) − min(GD)
Require: P ERCLOS, Y F, GD
if P ERCLOS ≥ 0.125 then Driver’s level of attention (LoA) is then calculated using
SoA ← Drowsy redistributed values according to Eq. 15.
else if Y F > 1 or 0.048 ≤ P ERCLOS < 0.125 then P ERCLOSScale_f +Y FScale_f +GDScale_f
LoA = 3 × 100%.
SoA ← F atigue (15)
else if GD is Right or GD is Lef t then A real-time graph (see Fig. 15, 16 and 17) is generated to
SoA ← Distracted show the Driver’s LoA along with live plotting of frame per
else second (F P S), P ERCLOS, mouth opening ratio, and head
SoA ← Attentive and eye position in terms of degree for each frame.
end if
Return SoA IV. EXPERIMENTAL ANALYSIS
To evaluate the proposed framework, three experiments were
carried out. Several measures such as false positive rate
C. MONITORING AND DECISION MAKING (FPR), false negetive rate (FNR), accuracy, and processing
The system makes the decision based on the estimated SoA time are investigated to assess the performance of the system
for the previous frames. Based on the value of SoA, this in real driving scenarios.
module generates an alert (sound and message signal) for the
driver. The system generates an alarm if the driver is found A. EXPERIMENTAL SETUP AND IMPLEMENTATION
in any of the inattentive (i.e. drowsy, fatigue, and distracted) DETAILS
SoA. The alarm system trigger a sound and display a message The system implemented and tested in Python programming
to alert the driver. Fig. 10 shows the output when drowsy, language using OpenCV and Dlib libraries on a general-
fatigue, and distracted SoA is detected. Alarm deactivates au- purpose laptop with Intel Core i5-4200U processor. The
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

laptop used 4GB RAM and a 64-bit Windows 10 operating During the test, each participant was asked to spend approx-
system. A CMOS webcam (Logitech C170) was connected to imately 5 minutes with the system. All of them participated
the laptop through USB 2.0 port. Detail specification of used bared face and wore accessories such as sunglasses, specta-
camera is given in Table 3. Considering the video capturing cles, and caps. A total of 17 minutes 40 seconds (5 minutes
resolution and focal length of the camera, it was mounted 35 seconds for Participant-1 + 6 minutes for Participant-2
perpendicularly in front of the participant’ face at a distance + 6 minutes 5 seconds for Participant-3) long videos with
of (0.8m) from the car dashboard. During evaluation, the involuntary eye blink and spontaneous yawns were captured
participants were requested to seat in the driving seat in front by the camera in front of the driver to verify the estimated
of the camera. Fig. 11 illustrates the setting of the experiment. parameters’ correctness under different daylight conditions.
The Toyota Allion of 2011 was used in the experimental Ideally the frame rate of the camera is 30 fps but practically
trials. the system cannot capture the video at this rate due to the
TABLE 3: Specification of Logitech C170. slow video capturing hardware, the contents of the video,
and the computational overload. A total of 10, 805 frames
Feature Description (at an average of 10 fps) were captured and analyzed for
Sensor type VGA sensor this experiment. Table 4 and Table 5 provide the number of
Video capture resolution 1024 x 768 frames analyzed separately in each case.
Focus type Fixed
Focal length 2.3 mm 2) Evaluation Measures
Maximum frame rate 30 fps Validation of parameters are done based on two criteria: false
Additional features Logitech fluid crystal technology positive rate (F P R) and false negative rate (F N R) defined
in (16) and (17).
FP os
FPR = (16)
(FP os + TN eg )
FN eg
FNR = (17)
(FN eg + TP os )
Here,
• TP os is number of positive instances in a testing se-
quence which were extracted cues correctly recognized
by the system (For example, driver is yawning and
system detected is correctly);
• TN eg is number of negative instances in a testing se-
quence which were correctly recognized by the system
FIGURE 11: Experimental setup. (For example, driver’s eye are close and system detected
them closed);
• FP os is number of positive instances in a testing se-

B. PARTICIPANTS quence which were extracted cues wrongly recognized


A total of 10 healthy drivers (mean age = 31.6, SD = 4.45) by the system (For example, driver is yawning but
with different facial features (beard, wearing glasses, and system was unable to detect it);
moustache), and hairstyles were participated in the exper- • FN eg is number of negative instances in a testing se-

iments. Seven of them were male and three were female. quence which were wrongly recognized by the system
They had no disabilities and no remunerations were paid to (For example, driver’s eye are close but system detected
them.Each driver has given his/her consent before participat- them them as open);
ing. We gave a brief description of the system and explain the
whole procedure before conducting the experiments. 3) Results
Accuracy of PERCLOS, YF and GD are dependent on the ac-
C. EXPERIMENT 1: TO VALIDATE THE ESTIMATED curacy of extracting cues such as eye state, yawn frequency,
PARAMETERS eye gaze direction and face direction respectively and esti-
To investigate the validity of the estimated parameters PER- mating SoA and LoA. At the first stage, techniques to extract
CLOS, YF, and GD, we conducted an experiment with differ- the cues were evaluated and then the video sequences were
ent participants and considering various lighting conditions. investigated to validate the estimated parameters subjectively.

1) Experimental Procedure and Data Collection a: Face Detection, Tracking and ROI Extraction
Three participants (one female and two male) took part in the Test was conducted on different participants with different
experiment with distinct facial appearances and hairstyles. situations to establish the system’s (i) face detection and
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

TABLE 4: Detailed data size of Experiment 1 in different situation.


No. of frames for each situation Total no. of frames
Situation
Participant-1 Participant-2 Participant-3 for each situation
Participant without any accessories 1103 1032 1250 3385
Participant with Spectacles 959 825 1065 2849
Participant with Sunglass 678 542 730 1950
Participant wearing Cap 911 989 721 2621
No. of frames for each participants 3651 3388 3766 10805
335 sec 360 sec 365 sec
Duration of captured video 17 min 40 sec
@ 11 fps (approx.) @ 9 fps (approx.) @ 10 fps (approx.)

TABLE 5: Detailed data size of Experiment 1 under different lighting condition.


No. of frames for each situation Total no. of frames
Situation
Participant-1 Participant-2 Participant-3 for each situation
Broad Day Light 2115 1932 1990 6037
Parking Garage 1536 1456 1776 4768
No. of frames for each participants 3651 3388 3766 10805
335 sec 360 sec 365 sec
Duration of captured video 17 min 40 sec
@ 11 fps (approx.) @ 9 fps (approx.) @ 10 fps (approx.)

tracking and (ii) region of interest (ROI) (i.e. eye, mouth and detects as yawn and false negative error is occurred
head region) extraction capabilities. Fig. 12 shows some sam- when yawn occurs but the system did detect it. Summary
ple frames where face of subject with different situation is of the yawn detection under different lighting conditions
detected and tracked and ROI is extracted. Observations from is shown in Table 7. Results revealed that wearing
this test suggest that algorithm used for detection, tracking spectacles, sun glass or cap doesn’t affect the yawn
and extraction are functioning satisfactory to accommodate detection. But varying lighting condition affected the
participants with different situation. yawn detection.
TABLE 7: Summary of yawning detection.
b: Cues Extraction
Then correctness of the extracted cues (i.e., eye state, yawn Lighting condition FPR (%) FNR(%)
detection, eye gaze direction and face direction) is investi- Broad day light 3.5 9.3
gated by F P R and F N R. Parking garage 7.4 12.5
• In case of eye state detection, F P R error is occurred
• In case of eye gaze direction and face direction detec-
when eyes are in open-state but detected as close-state tion, false positive occurs when system detected par-
and F N R error is occurred when eyes are in close- ticipants’ head/eye in standard direction but they are
state but the system detected as in open-state. Table 6 in different direction. On the other hand, false negative
represents the percentages of F P R and F N R. Results occurred when head or eye is in standard position but
TABLE 6: Summary of eye state detection. classified as left or right by the system. Table 8 and
Table 9 shows the percentages of F P R and F N R for
Situation FPR (%) FNR (%) eye gaze direction and face direction detection module.
Subject without spectacles 1.2 3.6 Like in eye state detection, the participant wearing
Subject with spectacles 35 18.2 spectacle affected the performance of eye gaze detection
Subject wearing cap 1.8 4.1 algorithm whereas face direction detection algorithm is
not affected by any of the situation much.
suggest that wearing cap doesn’t affect much in de-
tecting eye state but F N R of detecting eye state is TABLE 8: Experimental outcomes for eye gaze detection.
greater when wearing spectacles than without wearing Situation FPR (%) FNR (%)
spectacles. Our system is unable to detect eye state Participant without spectacles 3.8 5.1
when subject wears sun glass. Eye state detection was Participant with spectacles 16.7 12.5
not effected by lighting condition. Fig. 13 shows the Participant wearing cap 3 5.7
detection of eye state (both closed and open) by the
system. We also performed an analysis on determining the valid-
• To calculate the number of yawn, false positive error ity of the detection angle of FD. To detect the head ori-
is occurred when yawn doesn’t happen but the system entation for FD estimation, we used a standard solution
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

(a) (b)
FIGURE 12: (a) Face detection and tracking, and (b) ROI extraction of a participant bear faced and wearing different accessories
such as spectacles, sun glass and cap.

c: Estimation of SoA and LoA


To investigate the the effectiveness of SoA and LoA, we
analyzed the video sequences captured by the system and
studied State of Attention (SoA) and Level of Attention
(LoA) from the video frames. We mainly studied: (1) the
validity of the parameters for estimation of SoA and LoA;
(a) (b)
and (2) the correlation between parameters for estimation of
SoA and LoA. For example, Fig. 15, 16 and 17 represent
FIGURE 13: Eye state detection by system: (a) Open eyes (b)
Closed eyes. sample snapshots of our monitoring system where fatigue,
drowsiness, and distraction can be detected. The analysis is
TABLE 9: Experimental results for face direction detection. presented in three subsections: (a) SoA and LoA based on
Situation FPR (%) FNR (%) PERCLOS; (b) SoA and LoA based on PERCLOS and YF;
(c) SoA and LoA based on GD.
Participant without spectacles 6.2 4.8
Participant with spectacles 7.1 4.2 1) SoA and LoA based on PERCLOS: The objective
Participant with sunglass 5.8 3.5 of this experiment is to estimate the SoA and LoA
Participant wearing cap 6.8 5.1 from PERCLOS. The SoA is classified into attentive,
fatigue and drowsiness states according to Algorithm
1. Both fatigue and drowsiness are function of PERC-
of Perspective-n-Point (PnP) problem [48].To verify the LOS. If the SoA is in any of the non-attentive states
range of detection angle we asked a participant to rotate (i.e. fatigue, drowsiness) the monitoring system will
her face left and right to make different head orienta- generate an alarm displaying a warning message and
tions. A total of 2.5 minutes video sequence (i.e. around a beep sound. Fig. 15a presents a sample snapshot of
1650 frames) was analyzed to observe the estimated the monitoring system showing the analysis of frame
Euler angle yaw (α) for different head orientations. 33. It is showing from the top the frame rate (solid blue
The result suggests that implemented head orientation line), PERCLOS (solid blue line), RM (solid blue line),
algorithm can estimate α with a range of −90° to +90° FD (solid green line) and EGD (solid blue line), and
efficiently. However, beyond this range, it is unable to LoA (solid red line) for over a period of 60 seconds
detect head so could not estimate α. Fig. 14 shows head in the subgraphs respectively. In the graph, time in
orientation at varying yaw angles. Therefore, we set the seconds is shown (as secondary axis) on the top of the
detection angle for FD in between −90° and 90°. graph, and frame number is shown (as primary axis)
at the bottom of the graph. No yawning (second label
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

(a) −90◦ (b) −60◦ (c) −32◦ (d) 0◦ (e) 30◦ (f) 61◦ (g) 90◦
FIGURE 14: Yaw estimation for different head orientation.

from top marked with red circle) is detected during this


period and gaze direction (third label from top marked
with red circle) is frontal for this frame. PERCLOS is
computed using (1) and the SoA is estimated according
to Algorithm 1. The LoA is given in percentage (%)
and computed using (15).
a) Estimation of SoA: Fig. 15b presents the PER-
CLOS values (solid blue line) over time (shown
on the top) and frames (shown at the bottom).
The PERCLOS values define three distinct SoA:
(a) Analysis of a frame (frame 33): four insets on the left
attentive when the PERCLOS value is below the represent face detection and tracking (top), analysis of eye,
threshold value of 0.048 (dotted yellow line), head and mouth region (bottom) respectively;five sub-graphs
in middle represent Frame Rate, PERCLOS, Yawn frequency,
fatigue when the PERCLOS value is equal to Gaze direction, and Level of Attention from top respectively;
or above the threshold value 0.048 and drowsy four labels on the right represents frame number, YF, GD and
LoA (in %)
when the PERCLOS value is equal to or above
the threshold value of 0.125 (T hp = 0.125, dotted
red line). The driver was in attentive state until
frame number 24 (i.e. around 72 second) when
PERCLOS value was below 0.048. However,
after frame number 24 (i.e. around 72 second
72 second), PERCLOS value has a significant
increase and goes over the threshold value (T hp
= 0.125) after frame number 27 (i.e. around 76
second), which falls in a drowsy state and resulted
in generating an alarm. There is an intermediate
fatigue state shown in Fig. 15b, but the PERC-
LOS value was within this region for a very short
time. Therefore, no alarm was generated.
b) Estimation of LoA: The lower part of Fig. 15b (b) SoA and LoA analysis based on PERCLOS
presents LoA (in %). There is a correlation be- FIGURE 15: Analysis of SoA and LoA based on PERCLOS.
tween PERCLOS value and LoA. The driver
has the highest LoA (100%) until frame 24 (i.e.
around 72 second) and started falling as the of the graph, and frame number is shown (as primary
PERCLOS value increased above T hp =0.125 axis) at the bottom of the graph. From this figure, we
and LoA reached 66% at frame number 27 (i.e. can see the gaze direction (third label from top marked
around 76 second). with red circle) is frontal. PERCLOS value is shown
2) SoA and LoA based on PERCLOS and YF: The (calculated using eq. 1), RM value is calculated using
objective of this experiment is to estimate the SoA and (7) which is again used in estimating YF according to
LoA from PERCLOS and YF. A sample snapshot of eq. 4 and SoA is estimated according to Algorithm 1.
our monitoring system for frame number 78 is given in The LoA is shown in percentage (%) and computed
Fig. 16a. It is showing from the top the frame rate (solid using (15).
blue line), PERCLOS (solid blue line), RM (solid blue a) Estimation of SoA: Fig. 16b presents the PER-
line), FD (solid green line) and EGD (solid blue line), CLOS (top graph) and RM values (solid blue
and LoA (solid red line) for over a period of 160 line) (middle graph) for estimating YF over time
seconds in the subgraphs respectively. In the graph, (shown on the top) and frames (shown at the
time in seconds is shown (as secondary axis) on the top bottom). The PERCLOS and YF define two dis-
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

tinct SoA: attentive when the PERCLOS value is


below the threshold value of 0.048 (dotted yellow
line) and YF is equal to or less than 1, and fatigue
when the PERCLOS value is equal to or above
the threshold value 0.048 or YF is greater than
1. We at first analyzed the measured values of
PERCLOS. From Fig. 16b we observed that in
the first 60s, PERCLOS has significant increase
going over the threshold (T hp = 0.125, dotted red
line), which resulted in drowsy state. However,
to estimate the SoA for this frame we need to (a) Analysis of a frame (frame 78): four insets on the left
represent face detection and tracking (top), analysis of eye,
observe the PERCLOS measurement for the last head and mouth region (bottom) respectively;five sub-graphs
60 seconds i.e. from frame number 45 (at 140 in middle represent Frame Rate, PERCLOS, Yawn frequency,
Gaze direction, and Level of Attention from top respectively;
second) (labelled in Fig. 16b). We observed that four labels on the right represents frame number, YF, GD and
in the last 60 seconds, PERCLOS value is below LoA (in %)
0.048, which represents the attentive state.
YF is another parameter considered to estimate
fatigue. Mouth opening ratio (RM ) is used for
estimating YF. We can see RM is showing lots of
ups and down throughout the time (160 seconds)
under observation. To estimate SoA for current
frame, we only need to consider the measure-
ments of the last 60 seconds i.e. from frame
number 45 (at 140 second) (labelled in Fig. 16b).
We observed that during the last 60 seconds,
measurements of RM is close to the threshold
value (T hY = 0.04) thrice resulting in a YF value
of 3 (second label from top marked with red circle
in Fig. 16a) for the current frame and identifying (b) SoA and LoA analysis based on PERCLOS and YF
SoA as fatigue and generated an alarm. FIGURE 16: Analysis of SoA and LoA based on PERCLOS
b) Estimation of LoA: The lower part of Fig. 16b and YF.
presents LoA (in %). From Fig. 16b we can see
that during the first 60 seconds, LoA value (red
solid line) is low (66% to 85%) and started to circle) is detected during this period. Gaze direction
increase when PERCLOS value decreased and (third label from top marked using red circle) is left for
reached the threshold value (i.e. under 0.048). this frame. Here, we have analyzed the measurements
Observing the last 60 seconds i.e. from frame FD and EGD. FD and EGD are estimated using yaw
number 45 (at 140 second) (labelled in Fig. 16b) angle α of head and direction of eye center measured
we found that LoA started to decrease at frame using angle θ respectively according to the process
number 72 (at 180 second) and reached 58.33% described in Section III-A3. GD is calculated using FD
at frame number 75 (at 200 second) as three yawn and EGD according to (8), which can be left, front or
was detected during this time period which is right. SoA is calculated using GD value according to
visible from the measurements of RM . Algorithm 1, which can be attentive or distracted. If
SoA is distracted an alarm is generated. The LoA is
3) SoA and LoA based on GD: The objective of this given in percentage (%) and computed using (15).
experiment is to estimate the SoA and LoA from GD. a) Estimation of SoA: Fig. 17b presents the FD
Fig. 17a shows a sample snapshot of our monitoring (solid green line) and EGD (solid blue line) over
system for frame number 42. It is showing from the top time (shown on the top) and frames (shown at
the frame rate (solid blue line), PERCLOS (solid blue the bottom) which were used to estimate the GD.
line), RM (solid blue line), FD (solid green line) and The GD values define two distinct SoA: attentive
EGD (solid blue line), and LoA (solid red line) for over when the GD value is front and distracted when
a period of 60 seconds in the subgraphs respectively. the GD value is either left or right. From Fig. 17b
In the graph, time in seconds is shown (as secondary we can see that FD and EGD measurements show
axis) on the top of the graph, and frame number is a lot of variations i.e. crossing threshold values (
shown (as primary axis) at the bottom of the graph. yellow dotted lines for EGD and red dotted lines
No yawning (second label from top marked with red for FD) throughout the whole sequence (i.e. 75
12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

seconds) under consideration but but it requires 2 LoA not only depends on the current measurements of
seconds of observation for GD estimation. There- parameter but also on previous values of past frames.
fore, GD estimation requires observation of past It is worth mentioning that the correlation between
few frames of EGD and FD (i.e. from frame PERCLOS and the two attentional states (fatigue and
number 39, 95 second). We can see participant’s drowsiness) needs to be further investigated using hu-
EGD measurements were between the threshold man subjects with real sleep deprivation.
range (−8° < θ < 8°), but FD measurements
exceed the threshold value (i.e. −30°), which D. EXPERIMENT 2: TO EVALUATE ATTENTIONAL STATE
resulted GD as left, (third label from top marked The purpose of this experiment is to measure the accuracy
with red circle in Fig. 17a). The estimated SoA is of the proposed system in detecting four kinds of attentional
distracted, which generates an alarm. status: attentiveness, drowsiness, fatigue, and distraction.
b) Estimation of LoA: The lower part of Fig. 17b
presents LoA (in %). LoA measures (red solid 1) Experimental Procedure and Data Collection
line) in Fig. 17b also showed variation in values The experiment was performed in a controlled environment
though out the whole sequence with respect to considering the risk of driver’s inattentive state for traffic
other measured parameters. But observing the safety. We requested all the participants to pose various
last few frames i.e. from frame number 39 (95 expressions which exhibit different attentional state. Each
second) we can see that after reaching the peak, participant spent approximately 3.5 minutes on average, to
LoA started to decrease again with respect to simulate each state. A total of 140 minutes (10 [participants]
measurements of GD i.e. FD and LoA reached × 3.5 [interaction time] × 4 [types of status]) long video
83.33%. sequence was captured by the camera placed perpendicular
to the driver’s face for analyzing. More than 85K frames (For
details, see Table 10) were analyzed for this experiment.

2) Evaluation Measures
We measured the accuracy (A) of each attentional states by
using (18).
DF
A= × 100%, (18)
TF
where DF is total frame number of correctly detected atten-
tional state and TF is the total number of frames in a testing
(a) Analysis of a frame (frame 42): four insets on the left sequence.
represent face detection and tracking (top), analysis of eye,
head and mouth region (bottom) respectively;five sub-graphs
in middle represent Frame Rate, PERCLOS, Yawn frequency, 3) Results
Gaze direction, and Level of Attention from top respectively;
four labels on the right represents frame number, YF, GD and The accuracy of different kind of attentional state shown
LoA (in %) in Table 10. The results revealed that the proposed system
is performing quite satisfactory to estimate the states of
attention. The average accuracy to classify the attentional
states ranges from 91% to 95%. The system demonstrated
relatively higher accuracy of 95% for detecting drowsy state
than other three state.As the experiment performed in a
controlled environment, the results may vary, especially in
real sleeping deprivation.

E. EXPERIMENT 3: TO EVALUATE OVERALL


PERFORMANCE
This experiment was conducted to investigate the overall
performance of the proposed system concerning real driving
scenarios.
(b) SoA and LoA analysis based on GD
FIGURE 17: Analysis of SoA and LoA based on GD. 1) Experimental Procedure and Data Collection
All the participants were requested to drive the car with an
We can conclude from the above investigations: (1) average speed of 25km/hr and asked them to yawn and
estimation of SoA and LoA are accurate and (2) pa- blinks randomly. Driving activity was videotaped by the
rameters can independently SoA, but estimation of camera placed in front of the driver to estimate the frame
VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

TABLE 10: Accuracy, A (%) of different attentional states.


Participant Drowsiness Fatigue Distraction Attentiveness Total
no. of
no. DF TF A DF TF A DF TF A DF TF A frames
1 2032 2162 94 2078 2165 96 1888 2075 91 2047 2178 94 8580
2 2062 2104 98 1960 2130 92 1897 2132 89 1972 2120 93 8486
3 1901 2136 89 1890 2148 88 1992 2142 93 1890 2100 90 8526
4 2136 2180 98 1948 2164 90 2027 2156 94 1993 2120 94 8620
5 2140 2140 100 2134 2270 94 2192 2260 97 2212 2280 97 8950
6 1910 1990 96 1890 2100 90 1762 2002 88 1835 2017 91 8109
7 2109 2109 100 1967 2115 93 1917 2130 90 1969 2095 94 8449
8 2069 2201 94 1875 2180 86 1933 2172 89 1917 2130 90 8683
9 1945 2210 88 2016 2215 91 1957 2275 86 1954 2220 88 8920
10 1881 2090 90 2003 2131 94 1887 2029 93 2006 2180 92 8430
Average 95 Average 91 Average 91 Average 92 85753

TABLE 11: Overall accuracy (A) of the system.


processing time in terms of detection, tracking and extraction
and to estimate the overall accuracy. Each trial begun with a Participant no. No. of trials DF TF A(%)
predefined positioning of the participant. A total of 4 trials 1 4 10829 11520 94
was conducted with each participants and each trial took 6 2 3 9828 10800 91
minutes and approximately 4 hours in total. Average total 3 5 14418 16200 89
distance travelled by each driver is 10km. More than 130K 4 3 10260 10800 95
frames were analyzed to detect the number of frames in 5 5 19404 19800 98
which the system showed participants’ correct attentional 6 4 12096 14400 84
state. 7 3 7776 8640 90
8 4 12528 14400 87
2) Results 9 5 13392 14400 93
The overall performance of the system measured in terms of 10 4 15048 15840 95
accuracy and the processing time. Total 40 125579 136800 92
• The overall performance of the framework was calcu-
lated using (18). Table 11 shows overall accuracy of the
system for each participant individually. Analyzing the on cues from the eye region. Thus, the reflection
data of Table 11, we can see that overall accuracy for of light on spectacle led to the wrong estimation
this system ranges from 84% to 95% and Participant- of P ERCLOS and GD resulting in decreased
6, and Participant-8 have lower accuracy compared to performance.
other participants. After analyzing the videos sequences
The above discussion shows that the system’s overall
of the participants, we have the following observations:
accuracy depends directly on the parameters’ correct
-- Two out of four trials of Participant-6 was con- estimation. Results also revealed that the system has an
ducted in night environment in city areas with average overall accuracy of 92%.
lights coming from city street lights. Thus, some- • The processing time (TP ) for each frame is calculated
times the system estimated incorrect attentional by (19).
state when it was partially or completely dark due
to driving under/on fly-over, broken street light, TP = TF D + TC + TP + TLoA + TSoA , (19)
and blackout. As we used a simple web camera
where TF D and TC denotes the time elapsed by the
our system is unable to extract facial cues and
system to detect face and extract cues, TP and TLoA
estimate parameters (P ERCLOS, Y F , and GD)
indicates the time spent by system to estimate the pa-
accurately to classify state when it is partially or
rameters and driver’s level of attention (LoA), and TSoA
completely dark outside, which resulted in low
denotes the time spent by the system to classify driver’s
overall performance of the system.
attentional state (SoA). Table 12 shows the average time
-- Participant-8 wore spectacles while driving
taken by each task calculated from collected data. From
throughout all the trials. Due to light on specta-
the analysis of the data, it is revealed that the proposed
cle during broad daylight, the system estimated
framework took 107 milliseconds (msec) in total to
incorrect attentional state, as cues from eye region
process each frame resulting in a frame rate of 10 fps.
play a major role in classifying attentional state.
Both estimations P ERCLOS and GD depends
14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

TABLE 12: Average processing time (in msec).


Based on the above observations, it is evident that the
TF D TC TP TLoA TSoA proposed attention monitoring framework is simple and af-
13 48 30 7 9 fordable compared to existing non-intrusive vision-based
techniques.
The current version of the proposed system can extract
F. DISCUSSION driver’s attentional cues/features and classify them into atten-
We developed a real-time non-intrusive vision-based driver’s tive, drowsy, fatigued and distracted, but numerous promising
attention monitoring system. The evaluation results suggests dimensions to be included for further improvements. Fol-
that our system performs accurately under different condi- lowing issues can be addressed to make the system more
tions with different participants. Since reported non-intrusive functional:
vision-based systems had used different parameters and in-
• Distraction can be categorized into visual (e.g. looking
put devices and also estimated outputs are different we are
away from roadway focusing on something else), cogni-
unable to compare our system with them in terms of perfor-
tive (e.g. rambling of mind), auditory (e.g. focusing on
mance matrices. However, we summarized the existing non-
the ringing cell phone or loud music) and biomedical
intrusive vision-based techniques [6]–[10], [12]–[15], [17]
(e.g. using a cell phone and adjusting audio device)
along with the proposed system in Table 13. The techniques
distraction. In this work, we selected visual distraction
are summarized in terms of capturing sensor, used attentional
as we mainly focused on developing a vision-based sys-
parameters, estimated output and alert system.
tem and used facial cues to classify driver’s attentional
In the following, a number of key observations are high-
state. However, the remaining types of distractions are
lighted:
also major causes of unsafe driving. Therefore, a mod-
• Capturing sensors used by the reported systems were
ule may develop to detect various types of distraction,
expensive. Chowdhury et al. [13] used a Kinect sensor including additional sensors mounted on the car for
to estimate driver’s attention level and in [16] a fixed necessary data accumulation. We expect with the ad-
specialized camera was used with Raspberry Pi which dition of this module will improve the comprehensive
integrated with a SIM card. Multiple cameras were used functionality and robustness of the attention monitoring
by [14], [15], [17] and Chien et al. [12] used an IR system.
camera to capture video sequences. Wang et al. [6] was • Impairment due to alcohol is another vital factor that
prepared using a eye tracker. Similar to us [7]–[9], [11] causes traffic crashes along with inattentive driving.
also used a normal camera, but camera used by Tsai Thus, alcohol detection module can be incorporated,
et al. [11] and Shibli et al. [9] was expensive than the which will generate an alarm, when the system detects
proposed system. the driver is under the influence of alcohol by analyzing
• Although Alam et al. [7], [8] used the similar parameters
visual characteristics.
to estimate the attention state but they did not develop • To ensure safe driving and to deal with traffic accidents,
any alarm system to alert the driver in the lower level a vehicle tracking module may incorporate and establish
of attention during driving. Also, in [8] only distraction a connection to the server for effective monitoring and
was detected and in [7] the system was evaluated only notify traffic incident.
for three participants. Moreover, their system’s overall
accuracy is lower than the proposed system. Attentional In addition to the above, the proposed system should improve
parameters used by the systems [6], [10], [11], [14] to deal with the driving at night in the highway with no
were partially similar to us but estimated outputs of the street lights, driving under real sleeping deprivation and
systems are different, for example systems described in different driving positions. A statistical model can be utilized
[11], [6] and [10] focused on detecting fatigue behavior to improve the robustness and accuracy of the current imple-
only, whereas [14] concentrated on gaze direction/eye of mentation.
road. Methods in [15] and [17] analyzed driver’s body
movements and facial expressions to classify driver’s V. CONCLUSION
activity. Shibli et al. [9] and Chowdhury et al. [13] used A vision-based framework for determining drivers’ attention
entirely different parameters such as face angle and lip states is presented in this paper. The framework is able to
motion to estimate the attention level. detect inattentiveness of drivers in day or night environments
• The proposed system generates different sound as the when dim light is available (i.e. in city areas). Evaluation
alert and message for different inattentional state. The shows that the proposed framework is functioning well with
systems developed in [9] and [13] also display warning 92% accuracy in real-time driving scenarios. We plan to
message and sound similar to the proposed system. add more detection modules such as cognitive, auditory
System reported in [17] has a voice alert system, but and biomedical distraction detection, alcohol detection, and
[12] has a alert system which generates warning but type vehicle tracking and monitoring in future. Additionally, the
of warning is absent. The systems such as in [7], [8], proposed system may extend to function under more chal-
[10], [11], [14]–[16] have not included any alert system. lenging and practical driving scenarios. The proposed system
VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

TABLE 13: Summary of non-intrusive visual-feature based systems.


Article Capturing sen- Used attentional Estimated output Alert system
sor parameters
Gumaei et al. [16] Camera with Driver’s behaviour Distracted behaviour None
Raspberry Pi
device edge
integrated with
a SIM card
Zhang et al. [15] Cameras Body movements Driver’s activity None
Facial expressions
Wang et al. [6] Eye tracker Eye gaze Fatigue behavior into None
PERCLOS three classes: normal,
Blink frequency warning and danger
Tran et al. [17] Cameras Body movements Normal and distracted Voice-alert
Microphone behaviors
Alam et al. [7] Webcam PERCLOS Attention status: drowsy, None
Yawn frequency fatigue, distracted and
Gaze direction normal
Alam et al. [8] Webcam Gaze direction Attention status: None
distracted and normal
Shibli et al. [9] Webcam EAR Attention class: Sound
Face angle no attention and attention
Lip motion
Chien et al. [12] Infrared Pupil Drowsiness Warning
camera
Mandal et al. [10] Wide-angle PERCLOS Attention class: normal None
cameras and fatigue
Chowdhury et al. [13] Kinect sensor Face angle Attention class: Message and
Lip motion no attention and attention sound
Vicente et al. [14] Webcam Head pose Eye off road None
Tsai et al. [11] Webcam rPPg Fatigue state None
PERCLOS
Yawning state
Proposed System Webcam PERCLOS Attention status: drowsy, Messages and
Yawn frequency fatigue, distracted and sound based on
Gaze direction normal SoA

may be installed in vehicles and make a substantial impact to [5] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama, “Driver inattention
reduce the road crashes and save human lives. monitoring system for intelligent vehicles: A review,” IEEE Trans. Intell.
Transp. Syst., vol. 12, no. 2, pp. 596–614, June 2011.
[6] Y. Wang, R. Huang, and L. Guo, “Eye gaze pattern analysis for fatigue
VI. ACKNOWLEDGEMENT detection based on gp-bcnn with esm,” Pattern Recognition Lett., vol. 123,
This research work was supported in part by the Natu- pp. 61 – 74, 2019.
[7] L. Alam and M. M. Hoque, “Vision-based driver’s attention monitoring
ral Sciences and Engineering Research Council of Canada system for smart vehicles,” in Proc. Intell. Computing & Optimization
(NSERC) under Grant RGPIN-2020-06080. (ICO’18), P. Vasant, I. Zelinka, and G.-W. Weber, Eds. Cham: Springer
International Publishing, 2019, pp. 196–209.
[8] L. Alam and M. M. Hoque, “Real-time distraction detection based on
REFERENCES driver’s visual features,” in Proc. IEEE Int Conf. Elect. Comput. Commun.
[1] “The global status report on road safety 2018,” World Health Organization Eng. (ECCE’19), Feb 2019, pp. 1–6.
(WHO), Geneva 27 Switzerland, WHO Report, Dec. 2018. [9] A. M. Shibli, M. M. Hoque, and L. Alam, “Developing a vision-based
[2] P. T. Gimeno, G. P. Cerezuela, and M. C. Montañés, “On the concept and driving assistance system,” in Proc. Emerging Technolog. Data Mining
measurement of driver drowsiness, fatigue and inattention: implications Inform. Security, A. Abraham, P. Dutta, J. K. Mandal, A. Bhattacharya,
for countermeasures,” Int. J. Veh. Des., vol. 42, no. 1/2, pp. 67–68, 2006. and S. Dutta, Eds. Singapore: Springer Singapore, 2019, pp. 799–812.
[3] M. A. Regan, C. Hallett, and C. P. Gordon, “Driver distraction and driver [10] B. Mandal, L. Li, G. S. Wang, and J. Lin, “Towards detection of bus driver
inattention: Definition, relationship and taxonomy,” Accident Analysis & fatigue based on robust visual analysis of eye state,” IEEE Trans. Intell.
Prevention, vol. 43, no. 5, pp. 1771 – 1781, 2011. Transp. Syst., vol. 18, no. 3, pp. 545–557, March 2017.
[4] M. Gastaldi, R. Rossi, and G. Gecchele, “Effects of driver task-related [11] Y. Tsai, P. Lai, P. Huang, T. Lin, and B. Wu, “Vision-based instant
fatigue on driving performance,” Procedia - Social and Behavioral Sci., measurement system for driver fatigue monitoring,” IEEE Access, vol. 8,
vol. 111, pp. 955 – 964, 2014. pp. 67 342–67 353, 2020.

16 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

[12] J.-C. Chien, Y.-S. Chen, and J.-D. Lee, “Improving night time driving tracking system,” IEEE Trans. Human-Mach. Syst., vol. 44, no. 4, pp. 531–
safety using vision-based classification techniques,” Sensors, vol. 17, 536, Aug 2014.
no. 10, 2017. [36] M. Gjoreski, M. Z. Gams, M. Lustrek, P. Genc, J. Garbas, and T. Hassan,
[13] P. Chowdhury, L. Alam, and M. M. Hoque, “Designing an empirical “Machine learning and end-to-end deep learning for monitoring driver
framework to estimate the driver’s attention,” in Proc. IEEE Int. Conf. distractions from physiological and visual signals,” IEEE Access, vol. 8,
Inform. Electron. Vision (ICIEV’16), May 2016, pp. 513–518. pp. 70 590–70 603, 2020.
[14] F. Vicente, Z. Huang, X. Xiong, F. De la Torre, W. Zhang, and D. Levi, [37] Z. Guo, Y. Pan, G. Zhao, S. Cao, and J. Zhang, “Detection of driver
“Driver gaze tracking and eyes off the road detection system,” IEEE Trans. vigilance level using eeg signals and driving contexts,” IEEE Trans. Rel.,
Intell. Transp. Syst., vol. 16, no. 4, pp. 2014–2027, Aug 2015. vol. 67, no. 1, pp. 370–380, March 2018.
[15] C. Zhang, R. Li, W. Kim, D. Yoon, and P. Patras, “Driver behavior [38] Y. Yao, X. Zhao, H. Du, Y. Zhang, and J. Rong, “Classification of
recognition via interwoven deep convolutional neural nets with multi- distracted driving based on visual features and behavior data using a
stream inputs,” IEEE Access, vol. 8, pp. 191 138–191 151, 2020. random forest method,” Transp. Res. Rec., vol. 2672, no. 45, pp. 210–221,
[16] A. Gumaei, M. Alrakhami, M. Hassan, A. Alamri, M. Alhussein, M. A. 2018.
Razzaque, and G. Fortino, “A deep learning-based driver distraction identi- [39] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. of
fication framework over edge cloud,” Neural Computing and Applications, Comput. Vision, vol. 57, no. 2, pp. 137–154, May 2004.
pp. 1–16, 09 2020. [40] M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, “Discriminative
[17] D. Tran, H. Manh Do, W. Sheng, H. Bai, and G. Chowdhary, “Real-time scale space tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39,
detection of distracted driving based on deep learning,” IET Intell. Transp. no. 8, pp. 1561–1575, Aug 2017.
Syst., vol. 12, no. 10, pp. 1210–1219, 2018. [41] V. Kazemi and J. Sullivan, “One millisecond face alignment with an
[18] S. Kaplan, M. A. Guvensan, A. G. Yavuz, and Y. Karalurt, “Driver behav- ensemble of regression trees,” in Proc. IEEE Conf. Comput. Vision Pat-
ior analysis for safe driving: A survey,” IEEE Transactions on Intelligent tern Recognition (CVPR’14). Washington, DC, USA: IEEE Computer
Transportation Systems, vol. 16, no. 6, pp. 3017–3032, 2015. Society, 2014, pp. 1867–1874.
[19] D. S. Bowman, W. A. Schaudt, and R. J. Hanowski, Handbook of Intelli- [42] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “A semi-
gent Vehicles. London: Springer London, 2012, ch. Advances in Drowsy automatic methodology for facial landmark annotation,” in Proc. IEEE
Driver Assistance Systems Through Data Fusion, pp. 895–912. Conf. Comput. Vision Pattern Recognition Workshops, June 2013, pp. 896–
903.
[20] S. Gupta and S. Mittal, “Yawning and its physiological significance,” Int.
[43] T. Soukupova and J. Cech, “Eye blink detection using facial landmarks,”
J. Appl. Basic Med. Res., vol. 3, no. 1, pp. 11–15, 2013.
in Proc. 21st Comput. Vision Winter Workshop, Rimske Toplice, Slovenia,
[21] J. Grippenkoven and S. Dietsch, “Gaze direction and driving behavior of
feb 2016.
drivers at level crossings,” J. Transp. Safety & Security, vol. 8, no. sup1,
[44] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE
pp. 4–18, 2016.
Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66,
[22] J. G. Taylor and N. F. Fragopanagos, “The interaction of attention and Jan 1979.
emotion,” Neural Netw., vol. 18, no. 4, pp. 353 – 369, 2005. [45] S. Suzuki and K. be, “Topological structural analysis of digitized binary
[23] A. Zivony, A. S. Allon, R. Luria, and D. Lamy, “Dissociating between the images by border following,” Comput. Vision Graph. Image Process.,
n2pc and attentional shifting: An attentional blink study,” Neuropsycholo- vol. 30, no. 1, pp. 32 – 46, 1985.
gia, vol. 121, pp. 153 – 163, 2018. [46] S.-T. Wu, A. C. G. da Silva, and M. R. G. Marquez, “The douglas-peucker
[24] S. Benedetto, M. Pedrotti, L. Minin, T. Baccino, A. Re, and R. Montanari, algorithm: sufficiency conditions for non-self-intersections,” J. Brazilian
“Driver workload and eye blink duration,” Transp. Res. Part F: Traffic Comput. Soc., vol. 9, pp. 67 – 84, 04 2004.
Psychology and Behaviour, vol. 14, no. 3, pp. 199 – 208, 2011. [47] Ming-Kuei Hu, “Visual pattern recognition by moment invariants,” IRE
[25] E. D. Valck and R. Cluydts, “Slow-release caffeine as a countermeasure Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, February
to driver sleepiness induced by partial sleep deprivation,” J. Sleep Res., 1962.
vol. 10, no. 3, pp. 203–209, 2001. [48] F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni,
[26] F. Guede-Fernandez, M. Fernandez-Chimeno, J. Ramos-Castro, and M. A. “Faceposenet: Making a case for landmark-free face alignment,” in Proc.
Garcia-Gonzalez, “Driver drowsiness detection based on respiratory signal IEEE Int. Conf. on Comput. Vision Workshops (ICCVW’17), Oct 2017.
analysis,” IEEE Access, vol. 7, pp. 81 826–81 838, 2019. [49] J. Jiménez-Pinto and M. Torres-Torriti, “Optical flow and driver’s kinemat-
[27] J. Moon, Y. Kwon, J. Park, and W. C. Yoon, “Detecting user attention to ics analysis for state of alert sensing,” Sensors, vol. 13, no. 4, pp. 4225–
video segments using interval eeg features,” Expert Syst. Appl., vol. 115, 4257, 2013.
pp. 578 – 592, 2019. [50] “Road vehicles — Measurement and analysis of driver visual behaviour
[28] G. Li and W. Chung, “Combined eeg-gyroscope-tdcs brain machine in- with respect to transport information and control systems,” Standard ISO
terface system for early management of driver drowsiness,” IEEE Trans. 15007:2020, 2020.
Human-Mach. Syst., vol. 48, no. 1, pp. 50–62, Feb 2018.
[29] A. Sahayadhas, K. Sundaraj, M. Murugappan, and R. Palaniappan, “Phys-
iological signal based detection of driver hypovigilance using higher order
spectra,” Expert Syst. Appl., vol. 42, no. 22, pp. 8669 – 8677, 2015.
[30] J. Zhang, Z. Yin, and R. Wang, “Recognition of mental workload levels
under complex human–machine collaboration by using physiological fea-
tures and adaptive support vector machines,” IEEE Trans. Human-Mach.
Syst., vol. 45, no. 2, pp. 200–214, April 2015.
LAMIA ALAM received her B.Sc. and M. Sc.
[31] A. Alamri, A. Gumaei, M. Al-Rakhami, M. M. Hassan, M. Alhussein,
degree in Computer Science and Engineering from
and G. Fortino, “An effective bio-signal-based driver behavior monitoring
system using a generalized deep learning approach,” IEEE Access, vol. 8,
Chittagong University of Engineering & Tech-
pp. 135 037–135 049, 2020. nology (CUET), Bangladesh in 2014 and 2018
[32] C. C. Liu, S. G. Hosking, and M. G. Lenné, “Predicting driver drowsiness respectively. Currently, she is working as an As-
using vehicle measures: Recent insights and future challenges,” J. Safety sistant Professor in the Department of Com-
Res., vol. 40, no. 4, pp. 239 – 245, 2009. puter Science and Engineering (CSE) at Chit-
[33] A. Fernández, R. Usamentiaga, J. L. Carús, and R. Casado, “Driver tagong University of Engineering & Technology
distraction using visual-based sensors and algorithms,” Sensors, vol. 16, (CUET), Bangladesh. Her research interests in-
no. 11, 2016. clude Human-Computer Interaction (HCI), Com-
[34] J. M. Ramirez, M. D. Rodriguez, A. G. Andrade, L. A. Castro, J. Beltran, puter Vision (CV), and Machine Learning (ML). Ms. Alam is a member of
and J. S. Armenta, “Inferring drivers’ visual focus attention through head- IEEE and an associate member of the Institution of Engineers, Bangladesh
mounted inertial sensors,” IEEE Access, vol. 7, pp. 185 422–185 432, (IEB).
2019.
[35] K. Takemura, K. Takahashi, J. Takamatsu, and T. Ogasawara, “Estimating
3-d point-of-regard in a real environment using a head-mounted eye-

VOLUME 4, 2016 17

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

MOHAMMED MOSHIUL HOQUE is a Distin- NAZMUL SIDDIQUE is with the School of Com-
guish Professor of the Department of Computer puting, Engineering and Intelligent Systems, Ul-
Science & Engineering (CSE), CUET. He received ster University. He obtained Dipl.-Ing. degree in
Ph. D from the Dept. of Information and Computer Cybernetics from the TU Dresden, Germany, MSc
Sciences, Saitama University, Japan in 2012. He in Computer Science from BUET, Bangladesh and
is the former Head of the Department of Com- PhD in Intelligent Control from the Department of
puter Science & Engineering, CUET. Currently, Automatic Control and Systems Engineering, Uni-
He is serving as the Director of Students’ Wel- versity of Sheffield, England. His research inter-
fare, CUET and Director of Sheikh Kamal IT ests include: robotics, cybernetics, computational
Business Incubator in CUET. He served as the intelligence, nature-inspired computing, stochastic
TPC Chair, TPC Co-chair, Publication chair and TPC members in several systems and vehicular communication. He has published over 170 research
international conferences. Dr. Hoque was the Award Coordinator (2016- papers in the broad area of computational intelligence, vehicular commu-
17), Conference Coordinator (2017-18) and Vice-chair (Technical) (2018- nication, robotics and cybernetics. He authored and co-authored five books
20) of IEEE Bangladesh Section. Moreover, He was served as Vice-chair published by John Wiley, Springer and Taylor & Francis. He guest edited
(Activity) (2018-19), Award Coordinator (2017-18) of IEEE Computer eight special issues of reputed journals on Cybernetic Intelligence, Compu-
Society Bangladesh Chapter and Educational Activity Coordinator (2017- tational Intelligence, Neural Networks and Robotics. He has been involved in
20), IEEE Robotic & Automation Society, Bangladesh Chapter respectively. organizing many national and international conferences and co-edited seven
He is the founding Director of CUET Natural Language Processing Lab and conference proceedings. Dr. Siddique is a Fellow of the Higher Education
Fab Lab CUET. He published more than 125 publications in several Inter- Academy, a senior member of IEEE and a member of different committees of
national Journals, and Conferences. His research interests include Human IEEE SMCS. He is on the editorial board of the Nature Scientific Research,
Robot/Computer Interaction, Artificial Intelligence, Machine Learning, and Journal of Behavioural Robotics, Engineering Letters, International Journal
Natural Language Processing. Dr. Hoque is a senior member of IEEE, IEEE of Machine Learning and Cybernetics, International Journal of Applied
Computer Society, IEEE Robotics and Automation Society, IEEE Women in Pattern Recognition, International Journal of Advances in Robotics Research
Engineering, IEEE Signal Processing Society, USA and Fellow of Institute and also on the editorial advisory board of the International Journal of Neural
of Engineers, Bangladesh. Systems.

M. ALI AKBER DEWAN received the B.Sc.


degree in Computer Science and Engineering
from Khulna University, Bangladesh, in 2003, and
the Ph.D. degree in Computer Engineering from INAKI RANO holds a MSc in Physics and a Ph.D.
Kyung Hee University, South Korea, in 2009. in Computer Sciences from the University of the
From 2003 to 2008, he was a Lecturer with the De- Basque Country (Spain). From 1997 to 2004 he
partment of Computer Science and Engineering, was a research assistant with the San Sebastian
Chittagong University of Engineering and Tech- Technology Park and a member of the Robotics
nology, Bangladesh, where he was an Assistant and Autonomous Systems Group of the Univer-
Professor, in 2009. From 2009 to 2012, he was a sity of the Basque Country working in control
Postdoctoral Researcher with Concordia University, Montreal, QC, Canada. architectures for mobile robots. In 2005 he joined
From 2012 to 2014, he was a Research Associate with the École de Tech- the Computer Science and Systems Engineering
nologie Supérieure, Montreal. He is currently an Assistant Professor with department of the University of Zaragoza (Spain),
the School of Computing and Information Systems, Athabasca University, where he developed several biologically inspired navigation strategies for
Canada. He has published more than 50 articles in high impact journals and mobile robots. During 2008 he was on sabbatical leave at the University
conference proceedings. His research interests include artificial intelligence, of Essex (UK) working on the application of systems identification to the
affective computing, computer vision, data mining, information visualiza- generation of bio-inspired controllers. In 2011 he joined the Institute for
tion, machine learning, biometric recognition, medical image analysis, and Neural Computation of the Ruhr-Universitaet Bochum (Germany). He was
health informatics. He has served as an editorial board member, a Chair/ Co- Lecturer on Cognitive Robotics in the School of Computing and Intelligent
Chair and a TPC member in several prestigious journals and conferences. He Systems of the Ulster University between 2013 and 2018. Nowadays, he is
received the Dean’s Award and the Excellent Research Achievement Award an Assistant Professor with SDU Biorobotics at the University of Southern
for his academic performance and research achievements during his Ph.D. Denmark. His current research interest focuses on the applications of dy-
studies in South Korea. Dr. Dewan is a member of IEEE. namical systems and control theory to biorobotics.

18 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access

Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving

IQBAL H. SARKER received his Ph.D. under


the department of Computer Science and Software
Engineering from Swinburne University of Tech-
nology, Melbourne, Australia in 2018. Currently,
he is working as a faculty member of the De-
partment of Computer Science and Engineering at
Chittagong University of Engineering and Tech-
nology. His professional and research interests
include - Data Science, Machine Learning, AI-
Driven Computing, NLP, Cybersecurity Analytics,
Behavioral Analytics, IoT-Smart City Technologies, and Healthcare Ana-
lytics. He has published a number of peer-reviewed Journals and Confer-
ences in top venues, such as Journals (Journal of Network and Computer
Applications – Elsevier, USA; Internet of Things – Elsevier; Journal of Big
Data – Springer Nature, UK; Mobile Network and Applications – Springer,
Netherlands; Sensors - Switzerland; The Computer Journal, Oxford Univer-
sity Press, UK; IEEE Transactions on Artificial Intelligence, IEEE Access,
USA and so on) and Conferences (IEEE DSAA – Canada; IEEE Percom
– Greece; ACM Ubicomp – USA and Germany; ACM Mobiquitous –
Australia; Springer PAKDD – Australia; Springer ADMA - China, and so
on). He is one of the research founders of the International AIQT foundation,
Switzerland, and a member of ACM and IEEE.

VOLUME 4, 2016 19

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like