Active Vision-Based Attention Monitoring System Fo
Active Vision-Based Attention Monitoring System Fo
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Inattentive driving is a key reason of road mishaps causing more deaths than speeding or
drunk driving. Research efforts have been made to monitor drivers’ attentional states and provide support
to drivers. Both invasive and non-invasive methods have been applied to track driver’s attentional states, but
most of these methods either use exclusive equipment which are costly or use sensors that cause discomfort.
In this paper, a vision-based scheme is proposed for monitoring the attentional states of the drivers. The
system comprises four major modules such as cue extraction and parameter estimation, monitoring and
decision making, level of attention estimation, and alert system. The system estimates the attentional level
and classifies the attentional states based on the percentage of eyelid closure over time (PERCLOS), the
frequency of yawning and gaze direction. Various experiments were conducted with human participants to
assess the performance of the suggested scheme, which demonstrates the system’s effectiveness with 92%
accuracy.
INDEX TERMS Computer vision, attentional states, attention monitoring, human-computer interaction,
driving assistance, gaze direction
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
pattern and so on. Most of these sensors require direct contact between them are in terms of function and data. Our pre-
with skin which is intrusive for the users and often provides vious framework is developed for a real-time attentional
distorted information [5]. state detection only and assessed with a limited number of
On the other hand, the non-invasive methods do not re- tests. This framework is developed for vision-based attention
quire any contact with body, and the imaging sensor can monitoring of drivers in real-time. This work demonstrated
collect attentional information of the subject from distance. how the proposed framework classifies the drivers’ atten-
Vision-based attentional information includes facial features tional state and measures the level of attention by extracting
and body movements. Most used non-invasive attentional visual cues. Moreover, the proposed system evaluated in the
cues in currently available vision-based systems are eye- real driving scenario under diverse situations (i.e., drivers
lid movements (e.g.,eye blink frequency, closure duration) having different facial features such as beard, moustache, and
[6]–[11], eye gaze [6]–[8], [12], head movement [7]–[9], hairstyles or wearing accessories) proves its efficiency and
[13], [14], facial expressions (e.g., yawning, lip movements accuracy.
etc.) [7], [11], [13], [15], and body movements (like hand A number of approaches have been taken to predict driver
movement) [15]–[17]. However, these existing vision-based attention state for providing traffic safety and to reduce the
systems have some limitations: (i) capturing sensors used by number of accidents. However, most of the available systems
the aforementioned systems are either expensive camera(s) for driver’s attention monitoring purpose usually are either
[7]–[10], [14], [15], [17] with any additional sensor/hardware expensive or limited to special high-end car models. These
[11], [12], [16], [17] or some specialized imaging sensor systems cannot be affordable for drivers of low income or
(e.g., eye tracker [6] and Kinect [13]); (ii) some of the developing countries. Thus, an attention monitoring system
systems used only single parameters such as pupil [12], PER- should develop smart driving assistant which maintain a good
CLOS [10] and head pose [14] to estimate driver’s attentional balance between affordability and functionality. An effective
state making the system unable to adapt to some situations and used friendly system can save people’s lives. The major
which are common in the real driving scenario (for example, contributions of this work are:
turning head or wearing sun glass can hide eye) and resulting • Propose a vision-based framework that can constantly
in incorrect attentional state detection; (iii) some existing track the attentional states and level of attention of the
systems detect single inattentional state [8]–[14], [17] or driver.
limited only to the level of the same state [6], whereas some • Develop an awareness system by generating alarm for
other focus on detecting driver’s activity [15], [16]; (iv) some the driver if inattentional state is detected.
systems do not have any alert system to warn the driver of any • Evaluate the performance of the proposed framework in
inattentional state when detected during driving [6]–[8], [10], real driving scenarios.
[11], [14]–[16]; (v) some of the previous works [11], [15], The rest of the paper is organized as follows: Section II
[17] were evaluated in a simulated environment and may presents the related work. Section III provides a brief descrip-
not work accurately in the real driving scenario; and (vi) no tion of the proposed attention monitoring system. Section IV
evidence was provided about the systems mentioned above presents a number of experiments with corresponding results
working in diverse situations (e.g. drivers were having differ- and discusses the future works. Finally, in Section V, the
ent facial features such as beard, moustache, and hairstyles or paper is concluded with a brief summary.
wearing accessories (e.g. spectacle, sunglass, and cap) which
are common in real driving scenarios. II. RELATED WORK
In this paper, we intend to overcome the above limita- Attention is an important activity of the brain that decreases
tions. We propose a vision-based system that extracts driver’s the information flow into brain’s sensory system. It enhances
attentional cues/features to estimate attentional state and the relevant or vital parts of the input stream and discards
classifies it into attentive, drowsy, fatigued and distracted. disruptions [22]. Zivony et al. [23] has investigated the spatial
The system also alerts the driver in any of inattentional states attention which endorses the high-level processing and also
such as drowsy, fatigue or distraction. As we found that identifies some boundary conditions of attentional engage-
fatigue, drowsiness, and visual distraction are major causes ment. Their findings suggested that eye blink interrupted at-
of inattentiveness which are usually encountered during un- tentional engagement, whereas attentional capture (shifting)
safe driving and there is strong correlation between fatigue, was unaffected. Benedetto et al. [24] also suggested blink
drowsiness, visual distraction and drivers’ facial cues [18]. duration (BD) and blink rate (BR) as a more profound and
Among those as mentioned earlier non-invasive attentional trustworthy indicator of driver’s visual workload.
cues we also found that, the percentage of eyelid closure over In the past decade, detection of attention of driver’s has
time (PERCLOS) [19], frequency of yawning [20] and gaze become an active research field. A broad review of dif-
direction [21] to be most useful indicators for monitoring ferent approaches for attention detection has been reported
drivers’ attention. Thus, in this work, we have used these in [5]. These approaches are grouped into five categories
parameters to estimate the driver’s attentional state. The such as subjective, physiological, vehicle-based, visual be-
proposed framework is similar in its concept to our previous havioral, and hybrid. Subjective approaches involve detection
framework developed in [7]. However, the main differences of driver’s inattention through questionnaire, and feedback
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
are collected as rating scores [25]. However, the subjective proposed a framework to estimate driver’s attention in terms
approaches are not effective in detecting driver’s inattention of facial angle and lip motion. Vicente et al. [14] reported a
in a real time, rather these are effective to cross validate system based on head pose and gaze estimation that detects
the accuracy of other approaches. Physiological approaches Eyes off the Road (EoR). Most of these researches investi-
depend on some vital information such as heart rate, brain gated either distraction or sleepiness using only one or two
activity, skin conductance, etc. These approaches typically visual parameters for detection of drivers’ attention. To solve
detect hyper vigilance in a simulated human-machine system the inconvenience caused by physiological approaches in
based on physiological signals such as RRV, ECG, EEG, [11] a vision-based physiological signal measurement system
and EOG [26]–[30]. These systems either use wires and was proposed to estimate driver fatigue. The system uses only
electrodes running from the driver to the system causing one camera to collect physiological information (e.g. remote
distraction and annoyance to the drivers [29] or expen- photoplethysmography (rPPG) signal) to estimate heart rate
sive wearable respiratory inductive plethysmography (RIP) (HR), pulse rate variability (PRV), and facial features to
band [26], wireless headsets [27], [28] and EEG acquisition estimate the percentage of eyelid closure (PERCLOS) and
equipment [30]. Vehicle-based approaches involve evaluating yawning state to measure the fatigue state of the driver.
driving behaviour such as steering wheel movements, change The system was developed and tested in a controlled indoor
acceleration/ speed, and lane position changing and braking simulated driving environment with sufficient light to avoid
patterns over time for detecting inattentiveness [31], [32]. the interference from the external environment and ambient
Detecting inattentiveness based on driving behavior is not light with a high-resolution camera. Recommended condition
fairly reliable because the level of errors may vary from for rPPG estimation requires a good lighting condition with
person to person. Visual behavior-based approach involves high resolution and uncompressed camera. Therefore, when
extraction of visual features of the driver and have been used the lighting condition is flawed in a real driving scenario,
widely and effectively to detect driver’s inattentiveness [33]. rPPG based system may not function properly.
For example, Ramírez et al. [34] and Takemura et al. [35] Currently available systems are either expensive and lim-
proposed a method that uses a head mounted sensors. A few ited to special high-end car models or affordable solutions
researches have been conducted recently for driver’s attention that lack accuracy and robustness. That is what motivated
detection by integrating various approaches, for example, us to focus on implementing a driver’s attention monitoring
combining driver’s physiological signals and visual signal system to bridge the gap between affordability and avail-
[36], driver’s physiological signals and driving contexts [37] ability with functionality. In this research, we focused on
or driver’s visual cues and driving patterns [38]. developing a vision-based system that extracts driver’s atten-
Both researchers and drivers found non-intrusive vision- tional cues/features and classify them into attentive, drowsy,
based systems appealing for attentional state detection and fatigue and distracted. The system also alerts the driver in the
monitoring. Recently, Gumaei et al. [16] developed a use- event of any of inattentional state such as drowsy, fatigue or
ful camera-based framework for real-time identification of distraction.
drivers’ distraction by using two different deep learning mod-
els, a custom deep convolutional neural network (CDCNN) III. PROPOSED ATTENTION MONITORING FRAMEWORK
model and a visual geometry group-16 (VGG16)-based fine- The primary goal of this research is to develop a system that
tuned model and classified drivers’ behaviour in 10 cate- determines the attentional states of the driver during driving.
gories. An Interwoven Convolutional Neural Network (Inter- In this work, we considered cues related to eyes, mouth and
CNN) was proposed by Zhang et al. [15] to classify driver head region. Fig. 1 demonstrates the schematic illustration of
behaviors in real-time. This system can classify 9 different the proposed attention monitoring framework.
behaviors which are the most frequently encountered cases of Drivers’ attention monitoring starts with capturing video
distracted behavior during driving. In [6], a fatigue detection input of driver’s frontal face for visual cues using a gen-
system was proposed that classifies the behavior into three eral purpose webcam (Logitech C170) placed at a distance
category such as normal, warning and danger based on eye (0.6m − 0.9m) from the driver’s face on the vehicle fascia
gaze in real time. Tran et al. [17] proposed a rel-time driver (as shown in Fig. 2). The captured video sequence is sent to
distraction detection system that is able to identify 10 types the next module for further processing.
of distractions through multiple cameras and microphone and Isolating driver’s face region throughout the monitoring
also alerts through a voice message. Alam et al. [7], [8] process is the first important step. So, video sequence is
proposed a system that estimates attentional states of driver divided into frames, each frame is converted into gray scale
based on various visual cues. Shibli et al. [9] estimated level and then the face detection and tracking are performed.
of attention and detected fatigue during driving based on as- The Viola-Jones face detection algorithm [39] is employed
sessing eye aspect ratio (EAR) and head pose. Chien and Lee to detect faces. For each face in the frame, the algorithm
[12] proposed a system to detect situations when the driver’s returns the positions of the detected face as a rectangle. When
eyes exhibit distraction for a long duration and generates an more than one face is detected, the largest rectangle (closest
alarm. Mandal et al. [10] employed a classifier to identify face) is determined using the integral image and is marked
drivers’ state based on PERCLOS. Chowdhury et al. [13] as driver’s face. As we need to extract cues from the face
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
The average EAR of both eyes is calculated using (3). (a) (b)
EARRight + EARLef t FIGURE 7: Mouth region detected by the proposed system
EAR = . (3)
2 when mouth is (a) closed and (b) open.
2) Estimation of Yawn Frequency Ratio (RM ) of the width (WM ) to the height (HM ) of M R
Excessive yawning is associated with the fatigue. Yawning describes the rate of increase and it used as an indication of
frequency (Y F ) for a time widow of 60 seconds can be yawning. Therefore, RM can be defined by (7).
estimated by detecting the yawning and incriminating the
HM
corresponding counter (Y N ) using the equation defined in RM = . (7)
WM
(4).
Y F = Y NT2 . (4) RM is low (i.e., (≤ T hY )) when mouth is closed and vice-
verse. Here, threshold value T hY is calculated empirically.
where T2 is the time window. Estimation of yawn frequency Driver is considered to be yawning if a significant number of
need to isolate the mouth region (MR) from the rest of the successive frames (e.g. for 3s) of mouth state is found open.
image (see Fig. 6) and is done using equations (5)-(6). Y N denotes the number of yawning which is initially 0 and
x + w4 is incremented whenever yawn is detected.
x1
= , (5)
y1 y + 11h
16
3) Estimation of Gaze Direction
x + 3w
x2 4 Gaze direction (GD) is used to detect distraction state of
= , (6)
y2 y+h the driver. Both face direction (F D) and eye gaze direction
where (x, y) is initial point of detected face; w and h are the (EGD) are taken into account to estimate GD. Usually,
width and height. standard GD of a driver should be frontal. Deviation from the
After isolating M R, the system detects the width of the typical position for long period of time (T3 ) is an indication
opening of the mouth due to yawning. The area of the of distraction. GD can be calculated by (8).
mouth is determined by performing gray scale conversion,
GD = {F D, EGD}T3 . (8)
histogram equalization and finally an uneven segmentation
(SM R ) of the unlighted region of M R (i.e., inner part) is Eye centers are detected at first to estimate the GD. To
computed using the threshold (τ ), which is set using Otsu’s simplify the eye center detection, some prepossessing are
method [44]. The mouth area and the contour of the mouth performed on both eyes individually. First, the region of
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
eyes isolated utilizing the six (x, y)-coordinates (see Fig. 5). is determined using the Rodrigues rotation in (11).
Then, by using bi-linear interpolation method, eye regions
0 −rz ry
are resized. To improve the contrast in the image, histogram
R = cos θI + (1 − cos θ)rrT + sinθ rz 0 −rx ,
equalization is performed and skin pixels are eliminated by
−ry rx 0
setting a threshold depending on the highest count and the (11)
dimension of the equalized image. Finally, erosion followed where I is a vector in R3 and θ = krk2 .
by dilation is performed for noise removal. Fig. 8 shows the
detection process of the right eye. After the prepossessing
(a)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
laptop used 4GB RAM and a 64-bit Windows 10 operating During the test, each participant was asked to spend approx-
system. A CMOS webcam (Logitech C170) was connected to imately 5 minutes with the system. All of them participated
the laptop through USB 2.0 port. Detail specification of used bared face and wore accessories such as sunglasses, specta-
camera is given in Table 3. Considering the video capturing cles, and caps. A total of 17 minutes 40 seconds (5 minutes
resolution and focal length of the camera, it was mounted 35 seconds for Participant-1 + 6 minutes for Participant-2
perpendicularly in front of the participant’ face at a distance + 6 minutes 5 seconds for Participant-3) long videos with
of (0.8m) from the car dashboard. During evaluation, the involuntary eye blink and spontaneous yawns were captured
participants were requested to seat in the driving seat in front by the camera in front of the driver to verify the estimated
of the camera. Fig. 11 illustrates the setting of the experiment. parameters’ correctness under different daylight conditions.
The Toyota Allion of 2011 was used in the experimental Ideally the frame rate of the camera is 30 fps but practically
trials. the system cannot capture the video at this rate due to the
TABLE 3: Specification of Logitech C170. slow video capturing hardware, the contents of the video,
and the computational overload. A total of 10, 805 frames
Feature Description (at an average of 10 fps) were captured and analyzed for
Sensor type VGA sensor this experiment. Table 4 and Table 5 provide the number of
Video capture resolution 1024 x 768 frames analyzed separately in each case.
Focus type Fixed
Focal length 2.3 mm 2) Evaluation Measures
Maximum frame rate 30 fps Validation of parameters are done based on two criteria: false
Additional features Logitech fluid crystal technology positive rate (F P R) and false negative rate (F N R) defined
in (16) and (17).
FP os
FPR = (16)
(FP os + TN eg )
FN eg
FNR = (17)
(FN eg + TP os )
Here,
• TP os is number of positive instances in a testing se-
quence which were extracted cues correctly recognized
by the system (For example, driver is yawning and
system detected is correctly);
• TN eg is number of negative instances in a testing se-
quence which were correctly recognized by the system
FIGURE 11: Experimental setup. (For example, driver’s eye are close and system detected
them closed);
• FP os is number of positive instances in a testing se-
iments. Seven of them were male and three were female. quence which were wrongly recognized by the system
They had no disabilities and no remunerations were paid to (For example, driver’s eye are close but system detected
them.Each driver has given his/her consent before participat- them them as open);
ing. We gave a brief description of the system and explain the
whole procedure before conducting the experiments. 3) Results
Accuracy of PERCLOS, YF and GD are dependent on the ac-
C. EXPERIMENT 1: TO VALIDATE THE ESTIMATED curacy of extracting cues such as eye state, yawn frequency,
PARAMETERS eye gaze direction and face direction respectively and esti-
To investigate the validity of the estimated parameters PER- mating SoA and LoA. At the first stage, techniques to extract
CLOS, YF, and GD, we conducted an experiment with differ- the cues were evaluated and then the video sequences were
ent participants and considering various lighting conditions. investigated to validate the estimated parameters subjectively.
1) Experimental Procedure and Data Collection a: Face Detection, Tracking and ROI Extraction
Three participants (one female and two male) took part in the Test was conducted on different participants with different
experiment with distinct facial appearances and hairstyles. situations to establish the system’s (i) face detection and
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
tracking and (ii) region of interest (ROI) (i.e. eye, mouth and detects as yawn and false negative error is occurred
head region) extraction capabilities. Fig. 12 shows some sam- when yawn occurs but the system did detect it. Summary
ple frames where face of subject with different situation is of the yawn detection under different lighting conditions
detected and tracked and ROI is extracted. Observations from is shown in Table 7. Results revealed that wearing
this test suggest that algorithm used for detection, tracking spectacles, sun glass or cap doesn’t affect the yawn
and extraction are functioning satisfactory to accommodate detection. But varying lighting condition affected the
participants with different situation. yawn detection.
TABLE 7: Summary of yawning detection.
b: Cues Extraction
Then correctness of the extracted cues (i.e., eye state, yawn Lighting condition FPR (%) FNR(%)
detection, eye gaze direction and face direction) is investi- Broad day light 3.5 9.3
gated by F P R and F N R. Parking garage 7.4 12.5
• In case of eye state detection, F P R error is occurred
• In case of eye gaze direction and face direction detec-
when eyes are in open-state but detected as close-state tion, false positive occurs when system detected par-
and F N R error is occurred when eyes are in close- ticipants’ head/eye in standard direction but they are
state but the system detected as in open-state. Table 6 in different direction. On the other hand, false negative
represents the percentages of F P R and F N R. Results occurred when head or eye is in standard position but
TABLE 6: Summary of eye state detection. classified as left or right by the system. Table 8 and
Table 9 shows the percentages of F P R and F N R for
Situation FPR (%) FNR (%) eye gaze direction and face direction detection module.
Subject without spectacles 1.2 3.6 Like in eye state detection, the participant wearing
Subject with spectacles 35 18.2 spectacle affected the performance of eye gaze detection
Subject wearing cap 1.8 4.1 algorithm whereas face direction detection algorithm is
not affected by any of the situation much.
suggest that wearing cap doesn’t affect much in de-
tecting eye state but F N R of detecting eye state is TABLE 8: Experimental outcomes for eye gaze detection.
greater when wearing spectacles than without wearing Situation FPR (%) FNR (%)
spectacles. Our system is unable to detect eye state Participant without spectacles 3.8 5.1
when subject wears sun glass. Eye state detection was Participant with spectacles 16.7 12.5
not effected by lighting condition. Fig. 13 shows the Participant wearing cap 3 5.7
detection of eye state (both closed and open) by the
system. We also performed an analysis on determining the valid-
• To calculate the number of yawn, false positive error ity of the detection angle of FD. To detect the head ori-
is occurred when yawn doesn’t happen but the system entation for FD estimation, we used a standard solution
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
(a) (b)
FIGURE 12: (a) Face detection and tracking, and (b) ROI extraction of a participant bear faced and wearing different accessories
such as spectacles, sun glass and cap.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
(a) −90◦ (b) −60◦ (c) −32◦ (d) 0◦ (e) 30◦ (f) 61◦ (g) 90◦
FIGURE 14: Yaw estimation for different head orientation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
seconds) under consideration but but it requires 2 LoA not only depends on the current measurements of
seconds of observation for GD estimation. There- parameter but also on previous values of past frames.
fore, GD estimation requires observation of past It is worth mentioning that the correlation between
few frames of EGD and FD (i.e. from frame PERCLOS and the two attentional states (fatigue and
number 39, 95 second). We can see participant’s drowsiness) needs to be further investigated using hu-
EGD measurements were between the threshold man subjects with real sleep deprivation.
range (−8° < θ < 8°), but FD measurements
exceed the threshold value (i.e. −30°), which D. EXPERIMENT 2: TO EVALUATE ATTENTIONAL STATE
resulted GD as left, (third label from top marked The purpose of this experiment is to measure the accuracy
with red circle in Fig. 17a). The estimated SoA is of the proposed system in detecting four kinds of attentional
distracted, which generates an alarm. status: attentiveness, drowsiness, fatigue, and distraction.
b) Estimation of LoA: The lower part of Fig. 17b
presents LoA (in %). LoA measures (red solid 1) Experimental Procedure and Data Collection
line) in Fig. 17b also showed variation in values The experiment was performed in a controlled environment
though out the whole sequence with respect to considering the risk of driver’s inattentive state for traffic
other measured parameters. But observing the safety. We requested all the participants to pose various
last few frames i.e. from frame number 39 (95 expressions which exhibit different attentional state. Each
second) we can see that after reaching the peak, participant spent approximately 3.5 minutes on average, to
LoA started to decrease again with respect to simulate each state. A total of 140 minutes (10 [participants]
measurements of GD i.e. FD and LoA reached × 3.5 [interaction time] × 4 [types of status]) long video
83.33%. sequence was captured by the camera placed perpendicular
to the driver’s face for analyzing. More than 85K frames (For
details, see Table 10) were analyzed for this experiment.
2) Evaluation Measures
We measured the accuracy (A) of each attentional states by
using (18).
DF
A= × 100%, (18)
TF
where DF is total frame number of correctly detected atten-
tional state and TF is the total number of frames in a testing
(a) Analysis of a frame (frame 42): four insets on the left sequence.
represent face detection and tracking (top), analysis of eye,
head and mouth region (bottom) respectively;five sub-graphs
in middle represent Frame Rate, PERCLOS, Yawn frequency, 3) Results
Gaze direction, and Level of Attention from top respectively;
four labels on the right represents frame number, YF, GD and The accuracy of different kind of attentional state shown
LoA (in %) in Table 10. The results revealed that the proposed system
is performing quite satisfactory to estimate the states of
attention. The average accuracy to classify the attentional
states ranges from 91% to 95%. The system demonstrated
relatively higher accuracy of 95% for detecting drowsy state
than other three state.As the experiment performed in a
controlled environment, the results may vary, especially in
real sleeping deprivation.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
may be installed in vehicles and make a substantial impact to [5] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama, “Driver inattention
reduce the road crashes and save human lives. monitoring system for intelligent vehicles: A review,” IEEE Trans. Intell.
Transp. Syst., vol. 12, no. 2, pp. 596–614, June 2011.
[6] Y. Wang, R. Huang, and L. Guo, “Eye gaze pattern analysis for fatigue
VI. ACKNOWLEDGEMENT detection based on gp-bcnn with esm,” Pattern Recognition Lett., vol. 123,
This research work was supported in part by the Natu- pp. 61 – 74, 2019.
[7] L. Alam and M. M. Hoque, “Vision-based driver’s attention monitoring
ral Sciences and Engineering Research Council of Canada system for smart vehicles,” in Proc. Intell. Computing & Optimization
(NSERC) under Grant RGPIN-2020-06080. (ICO’18), P. Vasant, I. Zelinka, and G.-W. Weber, Eds. Cham: Springer
International Publishing, 2019, pp. 196–209.
[8] L. Alam and M. M. Hoque, “Real-time distraction detection based on
REFERENCES driver’s visual features,” in Proc. IEEE Int Conf. Elect. Comput. Commun.
[1] “The global status report on road safety 2018,” World Health Organization Eng. (ECCE’19), Feb 2019, pp. 1–6.
(WHO), Geneva 27 Switzerland, WHO Report, Dec. 2018. [9] A. M. Shibli, M. M. Hoque, and L. Alam, “Developing a vision-based
[2] P. T. Gimeno, G. P. Cerezuela, and M. C. Montañés, “On the concept and driving assistance system,” in Proc. Emerging Technolog. Data Mining
measurement of driver drowsiness, fatigue and inattention: implications Inform. Security, A. Abraham, P. Dutta, J. K. Mandal, A. Bhattacharya,
for countermeasures,” Int. J. Veh. Des., vol. 42, no. 1/2, pp. 67–68, 2006. and S. Dutta, Eds. Singapore: Springer Singapore, 2019, pp. 799–812.
[3] M. A. Regan, C. Hallett, and C. P. Gordon, “Driver distraction and driver [10] B. Mandal, L. Li, G. S. Wang, and J. Lin, “Towards detection of bus driver
inattention: Definition, relationship and taxonomy,” Accident Analysis & fatigue based on robust visual analysis of eye state,” IEEE Trans. Intell.
Prevention, vol. 43, no. 5, pp. 1771 – 1781, 2011. Transp. Syst., vol. 18, no. 3, pp. 545–557, March 2017.
[4] M. Gastaldi, R. Rossi, and G. Gecchele, “Effects of driver task-related [11] Y. Tsai, P. Lai, P. Huang, T. Lin, and B. Wu, “Vision-based instant
fatigue on driving performance,” Procedia - Social and Behavioral Sci., measurement system for driver fatigue monitoring,” IEEE Access, vol. 8,
vol. 111, pp. 955 – 964, 2014. pp. 67 342–67 353, 2020.
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
[12] J.-C. Chien, Y.-S. Chen, and J.-D. Lee, “Improving night time driving tracking system,” IEEE Trans. Human-Mach. Syst., vol. 44, no. 4, pp. 531–
safety using vision-based classification techniques,” Sensors, vol. 17, 536, Aug 2014.
no. 10, 2017. [36] M. Gjoreski, M. Z. Gams, M. Lustrek, P. Genc, J. Garbas, and T. Hassan,
[13] P. Chowdhury, L. Alam, and M. M. Hoque, “Designing an empirical “Machine learning and end-to-end deep learning for monitoring driver
framework to estimate the driver’s attention,” in Proc. IEEE Int. Conf. distractions from physiological and visual signals,” IEEE Access, vol. 8,
Inform. Electron. Vision (ICIEV’16), May 2016, pp. 513–518. pp. 70 590–70 603, 2020.
[14] F. Vicente, Z. Huang, X. Xiong, F. De la Torre, W. Zhang, and D. Levi, [37] Z. Guo, Y. Pan, G. Zhao, S. Cao, and J. Zhang, “Detection of driver
“Driver gaze tracking and eyes off the road detection system,” IEEE Trans. vigilance level using eeg signals and driving contexts,” IEEE Trans. Rel.,
Intell. Transp. Syst., vol. 16, no. 4, pp. 2014–2027, Aug 2015. vol. 67, no. 1, pp. 370–380, March 2018.
[15] C. Zhang, R. Li, W. Kim, D. Yoon, and P. Patras, “Driver behavior [38] Y. Yao, X. Zhao, H. Du, Y. Zhang, and J. Rong, “Classification of
recognition via interwoven deep convolutional neural nets with multi- distracted driving based on visual features and behavior data using a
stream inputs,” IEEE Access, vol. 8, pp. 191 138–191 151, 2020. random forest method,” Transp. Res. Rec., vol. 2672, no. 45, pp. 210–221,
[16] A. Gumaei, M. Alrakhami, M. Hassan, A. Alamri, M. Alhussein, M. A. 2018.
Razzaque, and G. Fortino, “A deep learning-based driver distraction identi- [39] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. of
fication framework over edge cloud,” Neural Computing and Applications, Comput. Vision, vol. 57, no. 2, pp. 137–154, May 2004.
pp. 1–16, 09 2020. [40] M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, “Discriminative
[17] D. Tran, H. Manh Do, W. Sheng, H. Bai, and G. Chowdhary, “Real-time scale space tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39,
detection of distracted driving based on deep learning,” IET Intell. Transp. no. 8, pp. 1561–1575, Aug 2017.
Syst., vol. 12, no. 10, pp. 1210–1219, 2018. [41] V. Kazemi and J. Sullivan, “One millisecond face alignment with an
[18] S. Kaplan, M. A. Guvensan, A. G. Yavuz, and Y. Karalurt, “Driver behav- ensemble of regression trees,” in Proc. IEEE Conf. Comput. Vision Pat-
ior analysis for safe driving: A survey,” IEEE Transactions on Intelligent tern Recognition (CVPR’14). Washington, DC, USA: IEEE Computer
Transportation Systems, vol. 16, no. 6, pp. 3017–3032, 2015. Society, 2014, pp. 1867–1874.
[19] D. S. Bowman, W. A. Schaudt, and R. J. Hanowski, Handbook of Intelli- [42] C. Sagonas, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, “A semi-
gent Vehicles. London: Springer London, 2012, ch. Advances in Drowsy automatic methodology for facial landmark annotation,” in Proc. IEEE
Driver Assistance Systems Through Data Fusion, pp. 895–912. Conf. Comput. Vision Pattern Recognition Workshops, June 2013, pp. 896–
903.
[20] S. Gupta and S. Mittal, “Yawning and its physiological significance,” Int.
[43] T. Soukupova and J. Cech, “Eye blink detection using facial landmarks,”
J. Appl. Basic Med. Res., vol. 3, no. 1, pp. 11–15, 2013.
in Proc. 21st Comput. Vision Winter Workshop, Rimske Toplice, Slovenia,
[21] J. Grippenkoven and S. Dietsch, “Gaze direction and driving behavior of
feb 2016.
drivers at level crossings,” J. Transp. Safety & Security, vol. 8, no. sup1,
[44] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE
pp. 4–18, 2016.
Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66,
[22] J. G. Taylor and N. F. Fragopanagos, “The interaction of attention and Jan 1979.
emotion,” Neural Netw., vol. 18, no. 4, pp. 353 – 369, 2005. [45] S. Suzuki and K. be, “Topological structural analysis of digitized binary
[23] A. Zivony, A. S. Allon, R. Luria, and D. Lamy, “Dissociating between the images by border following,” Comput. Vision Graph. Image Process.,
n2pc and attentional shifting: An attentional blink study,” Neuropsycholo- vol. 30, no. 1, pp. 32 – 46, 1985.
gia, vol. 121, pp. 153 – 163, 2018. [46] S.-T. Wu, A. C. G. da Silva, and M. R. G. Marquez, “The douglas-peucker
[24] S. Benedetto, M. Pedrotti, L. Minin, T. Baccino, A. Re, and R. Montanari, algorithm: sufficiency conditions for non-self-intersections,” J. Brazilian
“Driver workload and eye blink duration,” Transp. Res. Part F: Traffic Comput. Soc., vol. 9, pp. 67 – 84, 04 2004.
Psychology and Behaviour, vol. 14, no. 3, pp. 199 – 208, 2011. [47] Ming-Kuei Hu, “Visual pattern recognition by moment invariants,” IRE
[25] E. D. Valck and R. Cluydts, “Slow-release caffeine as a countermeasure Transactions on Information Theory, vol. 8, no. 2, pp. 179–187, February
to driver sleepiness induced by partial sleep deprivation,” J. Sleep Res., 1962.
vol. 10, no. 3, pp. 203–209, 2001. [48] F.-J. Chang, A. T. Tran, T. Hassner, I. Masi, R. Nevatia, and G. Medioni,
[26] F. Guede-Fernandez, M. Fernandez-Chimeno, J. Ramos-Castro, and M. A. “Faceposenet: Making a case for landmark-free face alignment,” in Proc.
Garcia-Gonzalez, “Driver drowsiness detection based on respiratory signal IEEE Int. Conf. on Comput. Vision Workshops (ICCVW’17), Oct 2017.
analysis,” IEEE Access, vol. 7, pp. 81 826–81 838, 2019. [49] J. Jiménez-Pinto and M. Torres-Torriti, “Optical flow and driver’s kinemat-
[27] J. Moon, Y. Kwon, J. Park, and W. C. Yoon, “Detecting user attention to ics analysis for state of alert sensing,” Sensors, vol. 13, no. 4, pp. 4225–
video segments using interval eeg features,” Expert Syst. Appl., vol. 115, 4257, 2013.
pp. 578 – 592, 2019. [50] “Road vehicles — Measurement and analysis of driver visual behaviour
[28] G. Li and W. Chung, “Combined eeg-gyroscope-tdcs brain machine in- with respect to transport information and control systems,” Standard ISO
terface system for early management of driver drowsiness,” IEEE Trans. 15007:2020, 2020.
Human-Mach. Syst., vol. 48, no. 1, pp. 50–62, Feb 2018.
[29] A. Sahayadhas, K. Sundaraj, M. Murugappan, and R. Palaniappan, “Phys-
iological signal based detection of driver hypovigilance using higher order
spectra,” Expert Syst. Appl., vol. 42, no. 22, pp. 8669 – 8677, 2015.
[30] J. Zhang, Z. Yin, and R. Wang, “Recognition of mental workload levels
under complex human–machine collaboration by using physiological fea-
tures and adaptive support vector machines,” IEEE Trans. Human-Mach.
Syst., vol. 45, no. 2, pp. 200–214, April 2015.
LAMIA ALAM received her B.Sc. and M. Sc.
[31] A. Alamri, A. Gumaei, M. Al-Rakhami, M. M. Hassan, M. Alhussein,
degree in Computer Science and Engineering from
and G. Fortino, “An effective bio-signal-based driver behavior monitoring
system using a generalized deep learning approach,” IEEE Access, vol. 8,
Chittagong University of Engineering & Tech-
pp. 135 037–135 049, 2020. nology (CUET), Bangladesh in 2014 and 2018
[32] C. C. Liu, S. G. Hosking, and M. G. Lenné, “Predicting driver drowsiness respectively. Currently, she is working as an As-
using vehicle measures: Recent insights and future challenges,” J. Safety sistant Professor in the Department of Com-
Res., vol. 40, no. 4, pp. 239 – 245, 2009. puter Science and Engineering (CSE) at Chit-
[33] A. Fernández, R. Usamentiaga, J. L. Carús, and R. Casado, “Driver tagong University of Engineering & Technology
distraction using visual-based sensors and algorithms,” Sensors, vol. 16, (CUET), Bangladesh. Her research interests in-
no. 11, 2016. clude Human-Computer Interaction (HCI), Com-
[34] J. M. Ramirez, M. D. Rodriguez, A. G. Andrade, L. A. Castro, J. Beltran, puter Vision (CV), and Machine Learning (ML). Ms. Alam is a member of
and J. S. Armenta, “Inferring drivers’ visual focus attention through head- IEEE and an associate member of the Institution of Engineers, Bangladesh
mounted inertial sensors,” IEEE Access, vol. 7, pp. 185 422–185 432, (IEB).
2019.
[35] K. Takemura, K. Takahashi, J. Takamatsu, and T. Ogasawara, “Estimating
3-d point-of-regard in a real environment using a head-mounted eye-
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
MOHAMMED MOSHIUL HOQUE is a Distin- NAZMUL SIDDIQUE is with the School of Com-
guish Professor of the Department of Computer puting, Engineering and Intelligent Systems, Ul-
Science & Engineering (CSE), CUET. He received ster University. He obtained Dipl.-Ing. degree in
Ph. D from the Dept. of Information and Computer Cybernetics from the TU Dresden, Germany, MSc
Sciences, Saitama University, Japan in 2012. He in Computer Science from BUET, Bangladesh and
is the former Head of the Department of Com- PhD in Intelligent Control from the Department of
puter Science & Engineering, CUET. Currently, Automatic Control and Systems Engineering, Uni-
He is serving as the Director of Students’ Wel- versity of Sheffield, England. His research inter-
fare, CUET and Director of Sheikh Kamal IT ests include: robotics, cybernetics, computational
Business Incubator in CUET. He served as the intelligence, nature-inspired computing, stochastic
TPC Chair, TPC Co-chair, Publication chair and TPC members in several systems and vehicular communication. He has published over 170 research
international conferences. Dr. Hoque was the Award Coordinator (2016- papers in the broad area of computational intelligence, vehicular commu-
17), Conference Coordinator (2017-18) and Vice-chair (Technical) (2018- nication, robotics and cybernetics. He authored and co-authored five books
20) of IEEE Bangladesh Section. Moreover, He was served as Vice-chair published by John Wiley, Springer and Taylor & Francis. He guest edited
(Activity) (2018-19), Award Coordinator (2017-18) of IEEE Computer eight special issues of reputed journals on Cybernetic Intelligence, Compu-
Society Bangladesh Chapter and Educational Activity Coordinator (2017- tational Intelligence, Neural Networks and Robotics. He has been involved in
20), IEEE Robotic & Automation Society, Bangladesh Chapter respectively. organizing many national and international conferences and co-edited seven
He is the founding Director of CUET Natural Language Processing Lab and conference proceedings. Dr. Siddique is a Fellow of the Higher Education
Fab Lab CUET. He published more than 125 publications in several Inter- Academy, a senior member of IEEE and a member of different committees of
national Journals, and Conferences. His research interests include Human IEEE SMCS. He is on the editorial board of the Nature Scientific Research,
Robot/Computer Interaction, Artificial Intelligence, Machine Learning, and Journal of Behavioural Robotics, Engineering Letters, International Journal
Natural Language Processing. Dr. Hoque is a senior member of IEEE, IEEE of Machine Learning and Cybernetics, International Journal of Applied
Computer Society, IEEE Robotics and Automation Society, IEEE Women in Pattern Recognition, International Journal of Advances in Robotics Research
Engineering, IEEE Signal Processing Society, USA and Fellow of Institute and also on the editorial advisory board of the International Journal of Neural
of Engineers, Bangladesh. Systems.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058205, IEEE Access
Alam et al.: Active Vision-based Attention Monitoring System for Non-Distracted Driving
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/