Real-Time_Driver_Drowsiness_Detection_Using_Hybrid_CNN-LSTM_Model_with_Facial_Feature_and_Behavioral_Analysis
Real-Time_Driver_Drowsiness_Detection_Using_Hybrid_CNN-LSTM_Model_with_Facial_Feature_and_Behavioral_Analysis
Behavioral Analysis
Abstract—Driver Drowsiness is a leading cause of road acci- a driver’s judgment and decision-making abilities. Common
dents. It is responsible for around 20 percent of fatal accidents all causes of driver drowsiness include extended driving hours,
around the world. Early detection of driver tiredness is crucial alcohol consumption, certain medications and driving after a
in preventing road accidents and enhancing road safety. There
are traditional detection methods like self-reporting, physiological heavy meal [1].
and vehicle-based measures but they are either unreliable or
intrusive which limits its effectiveness. In this paper, we have Many researchers around the globe are trying to find various
developed a deep learning-based technique to correctly detect methods to detect drivers drowsiness and ways to solve the
driver drowsiness in real time using a non-intrusive method. problem. The methods are generally of three types:
The proposed approach consists of physiological monitoring and 1. Physiological Measures
computer vision approach to analyze facial features, head poses,
blink duration, blink rate, rate of eye closure and yawning using It involves measurements that are obtained by placing
the camera within the vehicles. These features are processed intrusive electronic devices onto the skin of drivers. This
by deep learning models which are compatible with computer methods is highly intrusive and include EEG (Electroen-
vision. CNN model along with LSTM is used in finding temporal cephalography), ECG (Electrocardiography) and EOG (Elec-
patterns and trained on diverse dataset to predict the drowsiness
trooculogram) techniques. High cost and intrusive nature limits
levels. Results indicate hybrid model CNN-LSTM is effective in
detecting drivers drowsiness. its practical applications [2].
Index Terms—Driver Drowsiness Detection, Eye-Blink Mon- 2. Vehicle-based Measures
itoring, Fatigue Detection Systems, Behavioral analysis, Eye It is based on vehicle control systems, which includes
Movements, Transfer Learning
breaking patterns, steering patterns, and lane departure move-
ments. These techniques provides better results and are non-
I. I NTRODUCTION
invasive as compared to physiological measures but gives
Driver fatigue is a major contributor to the high frequency inaccurate results due to dependency on roads and drivers
of road accidents [1]. Prolonged periods of continuous driving behavior [2].
often lead to significant fatigue accumulation. This leads 3. Behavioral Measures
to drowsiness and causes drivers to lose control of vehicle These techniques depend on computer vision measures
leading to accidents. According to a study, more than 50% that is visual- based techniques such as analyzing drivers
(3.5–67.3%) of users agreed that they had been sleep deprived behavior using visual features like eyes, mouth and head.
[1]. These techniques are most effective in extracting relevant
Developing a non-intrusive device to detect driver drowsi- visual features of drivers that usually defines driver’s degree
ness presents significant challenges. Thousands of people have of sleepiness [2], [3].
died due to drowsy drivers, particularly due to drivers of goods
vehicles such as trucks. Frequent night trips often result in This study utilized behavioral parameters to detect drowsi-
sleep deprivation, increasing the risk of accidents caused by ness in drivers using deep learning approach. First, we use a
fatigue. parameter based on eyes, namely Eye Aspect Ratio (EAR). It
Intelligence Transportation Systems strive to ensure public is the ratio of the height of the eye to its width and calculated
safety and reduces road accidents. Major reasons of accidents using facial landmark or shape detection. This parameter is
in rural roads, is due to driver monotony and sleepy be- used to measure the openness of eye in each frame and a sharp
havior during driving. Prolonged driving significantly impairs drop in one frame implies an eye blink. Time-series analysis is
197
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS-2024)
IEEE Xplore Part Number: CFP24UG3-ART; ISBN: 979-8-3315-2963-5
used to identify between a normal or abnormal blink signifies and highlights a research gap of training the classifier on
and thus drowsiness [2], [4]. diverse datasets that could improve its robustness in varied
Next, for finding if a person is yawning we have used Mouth environments.
Aspect Ratio (MAR). MAR is computed as the ratio of height Rahman et al. (2022) [8] studies a real-time eye-blink moni-
of mouth usually taken from one end of the lip to another to toring system based on Viola–Jones and AdaBoost algorithms,
its width and an abnormal increase in its value for a period which are enhanced with Haar classifiers, and achieved an
implies yawning. accuracy of 94%. This system has problem in its performance
Finally, head tilt or movement detection was utilized to being influenced by distraction factor like obstructions and
determine if the person feels drowsy using his head positions occlusions. The authors emphasize that more advanced moni-
at different intervals of time. For example, head nodding toring techniques capable of accommodating a broader range
frequency is directly proportional to drowsy behavior and of driver behaviors need to be researched and applied for better
it increases exponentially due to backward bending. These performance.
measures were used to provide a more accurate model which Similarly, Coetzer et al. (2022) [9] have developed a
helps in determining whether the driver is drowsy or not in study based on Artificial Neural Networks (ANN), SVM, and
real time. AdaBoost aiming to develop a hybrid system that detects
To address the research gaps identified in previous studies, drowsiness in drivers and achieved an accuracy of 98.1%.
this study aims to develop a hybrid deep learning model using Their model performs well under challenging conditions like
a CNN-LSTM approach. The model integrates multimodal low lighting, though it faces problems with obstructions and
behavioral parameters, including Eye Aspect Ratio (EAR), head tilting during eye detection.
Mouth Aspect Ratio (MAR), and head movements, to enhance Weng et al. (2022) [10] has focused on improving detection
the accuracy of driver drowsiness detection. accuracy by combining facial expressions with other biometric
signals in a Multimodal Emotion Recognition system. This
II. L ITERATURE R EVIEW approach has enhanced detection capabilities of the system.
In the past decades, detection systems for driver drowsiness But it extensive focus centered on emotion recognition rather
have gained vast attention because of its crucial role in than fatigue detection causes it to lack persuasiveness to be
enhancing road safety. Many innovative solutions to real- developed as a driver drowsiness detection system and further
time driver fatigue detection have been developed under the lacks a focus on temporal analysis like blink rate and yawning.
continuous development of artificial intelligence and deep Anjali et al. (2021) [11] has applied the Viola–Jones al-
learning algorithms. The models were proposed with a focus gorithm on a Haar Cascaded Classifier to develop a reliable
on increasing accuracy, reducing spam alerts, wider availabil- method for detecting driver drowsiness. Despite its impres-
ity and integration with modern automobile systems. sive performance and reliability which can help in reducing
Das et al. (2024) [5] presents a cutting-edge method by drowsiness-related accidents, the extensive training on large
integrating CNN-LSTM and U-Net architectures for real-time datasets has become a bottleneck for it to function effectively
detection of driver drowsiness through facial movement study. in real-time environments. While The model also suffers from
Their system achieved an accuracy of 98.8 %, outstripping lack of robustness.
the performance of other models like VGG-16, GoogLeNet, Mungra et al. (2020) [12] has developed a study that tries to
AlexNet, and ResNet50. The primary advantage of this method identify whether CNN-based emotion recognition techniques
is its ability to provide high accuracy while being non- can help in detecting behaviors linked to fatigue. While their
intrusive. But the environmental factors such as lighting con- method demonstrates high accuracy in recognizing emotional
ditions and its computationally intensive nature causes the states, the limited exploration of data augmentation techniques
system to be sensitive and experience problems in detection. and alternative CNN architectures has left a vast room for
Li et al. (2023) [6] studied the method of Support Vector further research to come up with methods more related to
Machine (SVM) for drowsiness detection, accomplishing an drowsiness detection system as well as to address the gaps of
accuracy of 91.92%. The integration of SVM with other driver the paper.
assistance systems is a very practical solution but stumbles at Magan et al. (2022) [13] in their study use Dlib, Effi-
problems like the need for extensive training and the potential cientNet, and Recurrent Neural Networks developing a non-
of external factors such as stress or fatigue influencing results. intrusive detection system using cameras. Although this has
Thus they end up demanding a computationally intensive proven to be effective, the inference from external factors
hardware thereby decreasing wider adaptability. such as obstruction and positional logic causes the system
Pauly et al. (2023) [7] also employ Support Vector Ma- to be prone to extreme observations. One of the suggestion
chines, along with Histogram of Oriented Gradients (HOG), was to try for further development Alternative feature selection
which can help in detecting drowsiness in real-time with an techniques to improve detection accuracy.
accuracy of 91.6%. This method’s real-time capabilities make Lastly, Punitha et al. (2020) [14] states a approach based
it appreciated for early alert systems, though the model’s on Viola–Jones combined with Face Cascade Classifiers and
efficacy can be compromised under varying head tilts and SVM, that through eye-state analysis achieved a 93.5% accu-
lighting conditions. The study is limited by its small datasets racy in detecting drowsiness. While this method is effective,
198
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS-2024)
IEEE Xplore Part Number: CFP24UG3-ART; ISBN: 979-8-3315-2963-5
ambient features like lighting and occlusions may cause fluctu- TABLE I
ations in results. It suggests for more diverse dataset to be used R EVIEW OF PREVIOUS STUDIES ON DRIVERS DROWSINESS DETECTION
and further development to deal with diverse environmental Ref. Methodology Accuracy Advantages Limitations/
conditions. challenges
[5] CNN- 98.8% High accuracy, Limited
The recent developments in driver drowsiness detection LSTM real-time exploration
systems has revealed significant improvements in accuracy detection, of different CNN
and real-time monitoring. The review of previous studies is non-intrusive, architectures
integrates with and temporal
discussed in Table 1. Our study aims to tackle the challenges in ADAS systems dynamics
the current methods. There is a need to focus on the integration integration
of multimodal data and more sophisticated temporal analysis [6] SVM 91.92% Can integrate Need for
techniques. with other improved
driver assistance generalization
systems in real-world
driving scenarios
[7] Histogram 91.6% Detects Difficulty in de-
III. M ETHODOLOGY of Oriented drowsiness tecting under dif-
Gradient, in real-time ferent head poses
The proposed methodology combines the configuration of SVM and lighting con-
ditions
CNN (Convolutional Neural Networks) and LSTM (Long [8] Viola– 94% Real-time detec- More
Short-Term Memory) architectures to capture temporal and Jones, tion through eye- sophisticated
spatial features from facial landmarks extracted using Me- AdaBoost, blink monitoring monitoring
Haar needed for
diaPipe [15]. The proposed model is shown in Fig. 1. The Classifier different driver
detailed approach to this method is as follows. behaviours
[9] Artificial 98.1% Challenging en- Lack of
Neural vironmental con- robustness
A. Dataset Collection and Preparation Networks, ditions like poor in real-world
SVM, lighting can be driving situations
After collection of images from UTARLDD dataset, they AdaBoost handled
Multimodal RR - Improved Improper use of
were classified and organized into two separate folders, [10] Emotion 93% accuracy deep learning
which are ‘drowsy‘ and ‘non-drowsy‘ states accordingly as Recogni- through fusion architectures and
their name suggests. The images were preprocessed so that tion of multiple lack of focus
modalities n on drowsiness
key facial landmarks could be extracted using MediaPipe’s detection through
‘FaceMesh‘ module, which captures 468 facial landmarks temporal analysis
from these 2D images. For each image, facial landmarks were Viola– - Can minimize ac- Needs improve-
extracted to compute three key features: [11] Jones, cidents caused by ment in robust-
Haar drowsiness ness for general
1. Eye Aspect Ratio (EAR): An indicator of blinking behavior, Cascaded use in real-time
calculated using specific eye landmarks. It captures changes Classifier systems
in eye openness [16], [17]. CNN- 78.52 High accuracy Investigation
[12] based in detecting needed on
2. Mouth Aspect Ratio (MAR): Captures mouth movements, Emotion emotional states the impact of
specifically yawning, by analyzing distances between mouth Recogni- different CNN
landmarks [18]. tion architectures
CNN 93.0% Non-intrusive Limited
3. Head Pose Ratio: Analyzes the relative positions of the nose [13] detection using exploration
and chin, indicating head tilts and movement patterns relevant cameras of alternative
to drowsiness [19]. feature extraction
techniques
Viola– 93.5% Effective analysis Integration
B. Landmark Extraction [14] Jones, on the status of with other data
Haar eyes sources for
Cascade more accurate
Facial landmarks were extracted using MediaPipe, and Classifiers, detection
EAR, MAR, and head pose were calculated using custom func- SVM
tions to track key facial features associated with drowsiness.
For each image, a feature vector was formed by concatenating
the facial landmarks and the three features (EAR, MAR, head
pose). D. Data Splitting
199
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS-2024)
IEEE Xplore Part Number: CFP24UG3-ART; ISBN: 979-8-3315-2963-5
F. Training
Adam optimizer and binary cross-entropy loss function
were used to train the model, which prevent over fitting
and generalize the model, Early stopping and learning rate
reduction were employed. Early stopping kept track of the
validation loss and restored the best weights till the step after
Fig. 3. Training and Validation Curves
15 epochs without improvement, as a result of a 0.2 reduction
in the learning rate following 5 epochs of plateauing.
C. ROC Curve
G. Evaluation Metrics • ROC Curve: The curve was plotted in Fig. 4 to display the
• Accuracy: Accuracy was tracked throughout the training performance trade-offs, along with the AUC (Area Under
process for both the training and validation sets. the Curve) score for overall performance evaluation. This
• Confusion Matrix: Generated to visualize the perfor- helps in detecting at which threshold we can find the
mance of the model on the test set. better rate of system finding drowsiness with some trade-
• ROC and AUC: To examine the trade-off between drowsi- offs for the detection of non-drowsiness from our model
ness detection and non-drowsiness detection, the ROC and thus finding appropriate results.
curve was plotted. In order to facilitate comparison and The final result of the model are represented in Fig. 5
provide a scalar measure of the model’s classification and Fig. 6 representing the drowsy and non-drowsy states
performance, the area under the curve was also computed. respectively.
200
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS-2024)
IEEE Xplore Part Number: CFP24UG3-ART; ISBN: 979-8-3315-2963-5
TABLE II
COMPARISON OF PROPOSED WORK WITH PREVIOUS STUDIES
201
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Fourth International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS-2024)
IEEE Xplore Part Number: CFP24UG3-ART; ISBN: 979-8-3315-2963-5
202
Authorized licensed use limited to: MIT-World Peace University. Downloaded on February
979-8-3315-2963-5/24/$31.00 19,2025
©2024 IEEEat 18:15:43 UTC from IEEE Xplore. Restrictions apply.